AI Safety Is A Global Public Good

Chinese and Western AI scientists convene at the Berggruen Institute’s Casa dei Tre Oci In Venice, Italy.

From left: AI pioneers Stuart Russell, Andrew Yao, Yoshua Bengio and Ya-Qin Zhang at Casa dei Tre Oci in Venice, Italy. (Massimo Pistore/IDAIS)
Credits

Nathan Gardels is the editor-in-chief of Noema Magazine.

As Cold War tensions heightened in 1955, Albert Einstein and the philosopher Bertrand Russell issued a manifesto calling on eminent scientists and others from East and West to come together, “not as members of this or that nation, continent or creed, but as human beings, members of the species Man,” to warn of the existential danger posed by nuclear weapons and to propose ways to reduce that threat.

The first conference of thirty or so scientists organized in response to this appeal met in Pugwash, Nova Scotia in 1957. Over the years, this scientifically-grounded “dialogue across divides,” which came to be known as the Pugwash Conference, proved influential in achieving several milestones in the control of nuclear weapons, including the Partial Test Ban Treaty, the Non-Proliferation Treaty, the Anti-Ballistic Missile Treaty and conventions on chemical and biological weapons. For these efforts, the conference was awarded the Nobel Peace Prize in 1995.

The advent of ever more powerful artificial intelligences today in the context of geopolitical rivalry between the U.S.-led West and China has prompted the organization of a similarly urgent exchange across divides called the International Dialogues on AI Safety, organized by the Safe AI Forum.

The effort is comprised of top foundational AI scientists from both China and the West, including the Turing Award winners Yoshua Bengio, Andrew Yao and Geoffrey Hinton, among others, including Ya-Qin Zhang, the former president of Baidu. Its first meeting was held at Bletchley Park, home of the World War II codebreakers, outside of London in 2023. In 2024, the group met in Beijing at the Aman Summer Palace. The dialogue has just concluded its third conclave — and reached the most substantive consensus so far — at the Berggruen Institute’s Casa dei Tre Oci in Venice.

“Rapid advances in artificial intelligence systems’ capabilities are pushing humanity closer to a world where AI meets and surpasses human intelligence,” their statement reads. “Experts agree these AI systems are likely to be developed in the coming decades, with many of them believing they will arrive imminently. Loss of human control or malicious use of these AI systems could lead to catastrophic outcomes for all of humanity. Unfortunately, we have not yet developed the necessary science to control and safeguard the use of such advanced intelligence. The global nature of these risks from AI makes it necessary to recognize AI safety as a global public good, and work towards global governance of these risks. Collectively, we must prepare to avert the attendant catastrophic risks that could arrive at any time.”

The group put forward three main recommendations. They are reported here in full because of the historic significance of reaching the first-ever consensus among leading scientists from both East and West on common ground for regulating AI as a global public good.

  • Emergency Preparedness Agreements and Institutions

States should agree on technical and institutional measures required to prepare for advanced AI systems, regardless of their development timescale. To facilitate these agreements, we need an international body to bring together AI safety authorities, fostering dialogue and collaboration in the development and eventual auditing of AI safety regulations across different jurisdictions. This body would ensure states adopt and implement a minimal set of effective safety preparedness measures, including model registration, disclosure and tripwires.

Over time, this body could also set standards for and commit to using verification methods to enforce domestic implementations of the Safety Assurance Framework. These methods can be mutually enforced through incentives and penalty mechanisms, such as conditioning access to markets on compliance with global standards. Experts and safety authorities should establish incident reporting and contingency plans, and regularly update the list of verified practices to reflect current scientific understanding. This body will be a critical initial coordination mechanism. In the long run, however, states will need to go further to ensure truly global governance of risks from advanced AI.

  • Safety Assurance Framework

Frontier AI developers must demonstrate to domestic authorities the systems they develop or deploy will not cross red lines such as those defined in the IDAIS-Beijing consensus statement [prohibiting development of AI systems that can autonomously replicate, improve, seek power or deceive their creators, or those that enable building weapons of mass destruction and conducting cyberattacks]. 

To implement this, we need to build further scientific consensus on risks and red lines. Additionally, we should set early-warning thresholds: levels of model capabilities indicating that a model may cross or come close to crossing a red line. This approach builds on and harmonizes the existing patchwork of voluntary commitments such as responsible scaling policies. Models whose capabilities fall below early-warning thresholds require only limited testing and evaluation, while we must move to more rigorous assurance mechanisms for advanced AI systems exceeding these early-warning thresholds.

Although testing can alert us to risks, it only gives us a coarse-grained understanding of a model. This is insufficient to provide safety guarantees for advanced AI systems. Instead, developers should submit a high-confidence safety case, demonstrating their system design achieves a low probability of harm in a transparent, explainable manner, as is common practice in other safety-critical engineering disciplines. Additionally, safety cases for sufficiently advanced systems should discuss organizational processes, including incentives and accountability structures, to favor safety.

Pre-deployment testing, evaluation and assurance are not sufficient. Advanced AI systems may increasingly engage in complex multi-agent interactions with other AI systems and users. This interaction may lead to emergent risks that are difficult to predict. Post-deployment monitoring is a critical part of an overall assurance framework, and could include continuous automated assessment of model behavior, centralized AI incident tracking databases, and reporting of the integration of AI in critical systems. Further assurance should be provided by automated run-time checks, such as by verifying that the assumptions of a safety case continue to hold, and safely shutting down a model if operated in an out-of-scope environment.

States have a key role to play in ensuring safety assurance happens. States should mandate developers conduct regular testing for concerning capabilities, with transparency provided through independent pre-deployment audits by third parties granted sufficient access to developers’ staff, systems and records necessary to verify the developer’s claims. Additionally, for models exceeding early-warning thresholds, states could require that independent experts approve a developer’s safety case prior to further training or deployment. Moreover, states can help institute ethical norms for AI engineering, for example by mandating engineers have an individual duty to protect the public interest similar to those held by medical or legal professionals. Finally, states will also need to build governance processes to ensure adequate post-deployment monitoring.

While there may be variations in Safety Assurance Frameworks required nationally, states should collaborate to achieve mutual recognition and commensurability of frameworks.

  • Independent Global AI Safety and Verification Research

Independent research into AI safety and verification is critical to develop techniques to ensure the safety of advanced AI systems. States, philanthropists, corporations and experts should enable global independent AI safety and verification research through a series of Global AI Safety and Verification Funds. These independent funds should eventually scale to meet a funding target amounting to a third of AI research and development.

In addition to foundational AI safety research, these funds would focus on developing privacy-preserving and secure verification methods, which act as enablers for domestic governance and international cooperation. These methods would allow states to credibly check an AI developer’s evaluation results, and whether mitigations specified in their safety case are in place. In the future, these methods may also allow states to verify safety-related claims made by other states, including compliance with the Safety Assurance Frameworks and declarations of significant training runs.

Eventually, comprehensive verification could take place through several methods, including third party governance (e.g., independent audits), software (e.g., audit trails) and hardware (e.g., hardware-enabled mechanisms on AI chips). To ensure global trust, it will be important to have international collaborations developing and stress-testing verification methods.

Critically, despite broader geopolitical tensions, globally trusted verification methods have allowed, and could allow again, states to commit to specific international agreements.

The Gravitas Of Knowledge

Not unlike the challenge of nuclear weapons during the Cold War, a divergence has emerged between the foundational scientists who first developed the technology and harbor the greatest fears over their use and abuse, and those scaling it up and deploying it for their own purpose or profit.

The many billions being poured into AI by Big Tech, combined with escalating national security concerns in both East and West, have decisively shifted the momentum of AI development into the hands of the most ambitious and least cautious. Only the ethical weight of a group of disinterested scientists who possess the gravitas of knowing the technology from the inside has a chance to correct that asymmetry.