Image by Alan Warburton / © BBC / Better Images of AI / Virtual Human / CC-BY 4.0
There is an effort underway in the AI space to achieve something called AI alignment. The goal of AI alignment is to create AI in line with human ideals and values.
Anthropic, a company developing artificial general intelligence systems, has released Claude 2, its updated, conversational AI assistant to the broader public, joining OpenAI’s ChatGPT and Google’s Bard. Claude was developed using an approach for training artificial intelligence systems that Anthropic terms Constitutional AI, a form of AI alignment which provides explicit principles or instructions for the AI to follow. They refer to these principles as Claude’s “constitution.” This approach attempts to improve upon a common AI training method requiring human moderation, known as reinforcement learning, which relies on large-scale feedback from human beings to fine-tune the systems and teach them to learn.
In an attempt to offer more transparency than privately labeled data, Anthropic shares the principles that guided the development of Claude in a public document. Importantly, this document shares the company’s thinking behind some of their key decisions in developing Claude, a best practice in ethical deliberation.
More of the thinking of some of Anthropic’s employees was shared in this New York Times article, including their concerns about Claude’s release. It’s evident that not everyone is sure that the Constitutional AI approach is enough to mitigate concerns about the risks of AI systems becoming more than people can manage. It is, however, the hope, and Anthropic is betting that it is also the reality.
The company outlines how it developed specific directions for Claude by gathering guidance from such disparate sources as the United Nations Declaration of Human Rights, Apple’s Terms of Service, and two sets of Anthropic’s own research data. They added four questions, presumably to capture consideration of non-Western perspectives. They also drew from the constitutional rules governing Deep Mind’s customer service and support Chatbot, Sparrow, know, unsurprisingly, as “Sparrow Rules.” By using the Sparrow Rules, remarkably, we see one company, Anthropic, willing to acknowledge and build on the good work of another company, DeepMind.
On July 5th, OpenAI, the creators of ChatGPT, launched their own effort aimed at AI alignment with a focus on preparing means for managing superintelligent AI systems. Also unsurprisingly, this effort is called “Superalignment.” OpenAI is looking for a way to “steer and control” systems that become much smarter than humans. This kind of superintelligence is merely a hypothetical at this point in time, but it is the specter of superintelligent AI unchecked that causes some to feel AI poses an existential threat. OpenAI is committing a dedicated team of top machine-learning researchers and engineers and 20% of its compute resources over the next four years to achieve this goal.
There have been many bold initiatives in the development of technology over the years, plenty of dedicated resources, and teams led by brilliant people. What makes recent efforts so encouraging is the possibility of ethical convergence.
To develop Claude’s “constitution,” Anthropic drew from five different sources and developed 58 specific instructions. Companies the world over are developing lists of principles, many of them similar, reflecting the desire for safe technologies, developed through transparent means by people willing to be held accountable and to explain the technology they have developed. Companies are dedicating themselves to using fair processes and attempting to eliminate the bias any human brings to any effort.
In our own contribution to these efforts, our recently published technology ethics handbook, “Ethics in the Age of Disruptive Technologies: An Operational Roadmap,” two of us at the Markkula Center, joined by a third author, have offered our own comprehensive list of principles to consider, along with stages for companies to intentionally pass through as they develop technology, and conditions organizations should look for or develop in their cultures to be sure that ethics takes hold.
For many years, scholars and religious leaders have sought a global ethic. Indeed, the Markkula Center has hosted events aimed at answering the question, “Is there a single definition of human values, or many? Is there a core of values that all societies and religions hold? Are there some behaviors that are universally held to be unethical?” An international convention in 1993 created an initial draft of the Global Ethic.
That so many different organizations are now striving for the same goal—to be sure that technology reflects human values and intentions—means that we are striving to protect what it means to be human. That many people not only recognize this goal but accept its legitimacy explains the convergence on an ever-sharpening view of what humans do and what humans consider right and good.
To create “helpful, honest, harmless” new artificial intelligence systems, as Anthropic describes Claude 2, requires providing instructions for them to know what to do. The decisions that go into labeling data are no doubt fraught with human biases. There are places in the world that will no doubt resist some of the principles emerging because they do not support, for example, the United Nation’s Declaration of Human Rights. The instruction sets provided to Constitutional AI are no doubt not yet complete enough to imagine every scenario or to always provide a correct answer.
In spite of these doubts and limitations, setting a goal to arrive at a “right” answer is something to be celebrated, for it signals not only a belief, but a growing wave of commitments, to trying to do just that. This wave of ethical convergence, driven by a desire to develop responsible technology, might just be strong enough to accomplish what centuries of discussion have not yet done—arrive at a global ethic.