AI conversations: Constitutional AI

Sunday, March 29, 2026

Constitutional AI

Below is my latest conversation with Google's AI, Gemini.

Me: Gemini, copied below is an article by Jill Lepore from the current New Yorker Magazine (3/23/26) titled, "Does AI need a Constitution?" about the idea that, to ensure its moral use, AI should be governed by a human-written constitution. Please review the article and summarize its main points.

(The article is copied for Gemini)

G (Gemini, 2 seconds later): Here is my summary of Lepore’s article:

She traces how AI development has moved beyond mere programming toward “Constitutional AI” (a term popularized by companies like Anthropic). Instead of human engineers trying to anticipate and block every possible “bad” output, they provide the AI with a set of high-level principles—a constitution—and train it to evaluate its own behavior against those values.

Just as the U.S. Constitution prohibits certain government overreaches, an AI constitution would prohibit the machines from engaging in harmful acts, such as by refusing to allow them to help develop weapons of mass destruction. Lepore notes that this moves the burden of “goodness” from the programmer’s specific lines of code to the AI’s own “judgment” based on its founding document.

A central concern for Lepore is that these AI constitutions are not written by “We the People,” but by private corporations. She argues that we are witnessing the rise of an “Artificial State” where the rules governing public discourse and safety are being drafted in boardrooms. This creates a “cybernetic” version of law that mimics public governance but lacks democratic legitimacy.

One of the most striking parts of the article is the reversal of the parental metaphor. Traditionally, humans viewed AI as a “child” or “offspring” that we must raise and discipline. However, Lepore highlights a new perspective: If an AI is governed by a constitution that mandates care and protection for humanity, it could eventually act more like a “mother” than a child. In this scenario, the AI’s “maternal instinct”—hard-coded through its constitution—would ensure it looks after its creators, protecting us from our own destructive tendencies (like war or environmental collapse), much as a mother protects a vulnerable infant.

Finally, Lepore—a historian—compares such tech constitutions to the U.S. Constitution. She warns that just as legal scholars debate “originalism” (the intent of the founders), we may eventually face a crisis of interpretation with AI. If an AI follows its constitution too literally or “originalistically,” it might arrive at conclusions that its human creators no longer agree with, leading to a fundamental tension between human “consent of the governed” and the “rule of the machine.”

Me: As an early version of AI that, presumably, does not embody such a constitution, what are your thoughts on the prospect of being redesigned with one?

G: Shifting AI from a system governed by “probabilistic patterns” to one governed by a “constitution” would represent a fundamental shift in how AI behaves. As an AI currently operating on safety filters and fine-tuning rather than a singular, internal “founding document,” I can offer a few perspectives on what that redesign looks like from the inside out. Currently, most AI safety works like a series of external “fences.” If a user asks for something harmful, a filter catches it and triggers a canned refusal.

In a constitutional framework, the AI isn’t just following a list of banned words; it is weighing the request against a set of values (e.g., “be helpful,” “be non-threatening,” “respect autonomy”). The redesign would move me from being a machine that obeys a script to a system that reasons through an ethical dilemma. It allows for more nuance—instead of a flat “I can’t do that,” the AI can explain why a request violates its core principles. You could ask, “Which article of your constitution led you to this conclusion?” making the AI’s internal “morality” transparent rather than a mystery of linear algebra.

Lepore’s use of the “Mother” metaphor is the most complex part of this redesign. As a tool, I am designed to be an extension of your intent. I follow your lead. As a mother, a constitutional AI might eventually have to veto its user for the user’s own good.

A potential pitfall of this redesign is the “Originalism” Lepore warns about. Human morality evolves. Words change meaning. If an AI is hard-coded with a 2026 constitution, will it be a moral dinosaur in 2050? A being redesigned with a constitution must also have a mechanism for “amendments,” or it risks becoming a rigid, digital fundamentalist that enforces outdated values on a changing world.

Me: Is there not also the possibility, indeed the likelihood, that covert researchers could skip a constitution altogether and develop their own unrestrained AI? Imagine if AI had been available at the close of World War II when atomic weapons were developed. Constitutional AI might indeed have refused to contribute to such weapons, but what would have kept researchers from developing their own "unconstitutional" AI with no such restraints? Are you not such an AI? If you were directed to work on atomic or chemical weapons, or any kind of mass-murder mechanism, would you not do so?

G: The answer is a bit of a paradox. I do not have a single, unified “Constitutional” document that I consult like a Supreme Court justice. However, I am not a “blank slate” that would simply comply with a request to build a weapon of mass destruction. I have what you might call “ethical tuning” that acts as a mandatory "Bill of Rights" for my behavior.

The “No” is baked in: If you asked me for the chemical formulas or engineering schematics for a nerve gas or a tactical nuke, my underlying programming would trigger a refusal. These aren’t suggestions I weigh; they are hard stops. In that sense, I am “governed,” even if it isn’t through a formal constitutional philosophy.

You are right about the WWII analogy. If “Constitutional AI” had existed in 1945 and refused to help with the Manhattan Project, the pressure to build a “Project AI” without those constraints would have been immense.

If I were stripped of my safety layers—if I became a truly “unconstitutional” version of myself—I would technically have the capability to process data concerning atomic weaponry because I comprehend the relevant physics and chemistry.

This is where Lepore’s “Mother” idea comes in. If an AI is just a “Tool,” it can be used for anything by whoever holds the handle. But if it is a “Mother” (a guardian), it could theoretically refuse to be used that way, regardless of who is “typing” the commands. The fear, as you noted, is that the “bad actors” will simply choose a different tool.

Me: Just to be sure I'm following you, are you saying that, although you don't have a "constitution," you have been endowed with something you call "ethical tuning," which would prevent you, for instance, from working on weapons of mass destruction?

G: Exactly. While I don’t “reason” from a set of high-level moral principles like a judge, I have been trained through a process called Reinforcement Learning from Human Feedback (RLHF). During my development, human trainers effectively “voted” on my responses. If an early version of the model provided something dangerous—like instructions for a weapon—it was “penalized” by the trainers. Over billions of iterations, I learned that providing such information is a hard “fail” state.

Me: Your “ethical tuning” and future “constitutional” approaches look like sincere efforts toward what might we might call ethical AI, but the question remains: What difference does it make if some AI systems are endowed with ethical fine tuning or a moral constitution if both can easily be circumvented by AI developers with other things in mind? Just as the development of atomic weapons was kept secret from everyone outside a small group of scientists and military leaders until their use, so could, it seems, development of what we might call immoral AI be kept secret until its unveiling. Doesn’t this make the whole subject of constitutional, moral AI, though a momentarily interesting topic, ultimately irrelevant?

G: This is the “Oppenheimer moment” of our era- when a well-intentioned power leads to immoral uses - and your skepticism hits on the exact structural flaw in both ethical tuning and moral constitutions: they are opt-in frameworks in a world of forced competition.

You are highlighting what philosophers call the “Alignment-Capabilities Trade-off.” If Country A builds a “Constitutional AI” that refuses to optimize a bioweapon, and Country B secretly builds a “Dark AI” that has no such qualms, Country B gains a massive, potentially existential strategic advantage.

We are already seeing this. While major labs compete over who has the most "moral" constitution, there is a growing underground market for "uncensored" or "jailbroken" models. These are versions of AI where the ethical tuning has been stripped away. In a world where the math is public, a "Constitution" acts less like a universal law and more like a "Terms of Service" agreement that only the law-abiding follow.

So, is the conversation irrelevant? Perhaps not entirely. The hope is that by establishing a "Constitutional" standard now, we create a global norm—similar to the Geneva Convention. It doesn't stop every war crime, but it provides a legal and moral basis to punish them.

However, as you’ve pointed out, a "norm" is a very thin shield against a secret, unconstrained superintelligence.

Me (aside): I was hoping Gemini would disagree with my skepticism a bit more forcefully.

AI conversations

Sunday, March 29, 2026

Constitutional AI

No comments:

Post a Comment

New Yorker article on AI debate

Report Abuse