Your AI Breaks It? You Buy It.

Gabriel Weil

Your AI Breaks It? You Buy It.

AI developers should pay for what they screw up.

Chloe Karayiannis for Noema Magazine

Essay Technology & the Human

By Gabriel Weil September 10, 2024

Credits

Gabriel Weil is a law professor specializing in AI governance, climate change and tort law.

If you are struck by rubble from someone blasting dynamite nearby or your neighbor’s pet tiger bites off your hand, you would rightly expect them to compensate you for your injuries. And the law would back you up — even if they took reasonable precautions to keep others safe.

That’s because the law applies “strict liability” to harms resulting from activities it deems are “abnormally dangerous,” or uncommon and capable of generating large risks to others even if those undertaking them are careful.

In much the same way, the development of frontier artificial intelligence systems should be legally treated like housing wild animals or blasting dynamite, because its consequences can also be abnormally dangerous.

AI could soon supercharge economic growth, treat or perhaps even cure major diseases, and transform how we live and work. Still, many experts also worry it could result in havoc or the end of human civilization.

Yet, because there is also no clear agreement on the magnitude or timing of such potential risks, the response to the latter concerns have been similarly mixed. Some researchers have called for draconian restrictions on AI research and development. Others warn that impediments to innovation would impose great costs on humanity and undermine the strategic position of the United States in the geopolitical competition with China.

What we need is a balanced approach that allows society to capture the enormous potential benefits of AI, while minimizing the risks. Thankfully, just such a policy is available: Make AI companies pay for the harm they cause.

The principle is quite simple. If people and companies don’t need to bear the costs of the harms and risks they generate, they feel more free to engage in risky behavior. But, you may ask, doesn’t the existing tort system already impose plenty of liability? Isn’t America already an overly litigious society where people sue McDonald’s for serving their coffee hot? Well, not quite.

Under current, broadly applicable law, AI developers are only liable for problematic outcomes if they fail to adopt well-established precautionary measures that plaintiffs can prove would have prevented the injury. If OpenAI does all the standard fine-tuning, red-teaming and evaluations before releasing GPT-5, they are unlikely to be held liable if someone then jailbreaks the model and uses it to operate a massive internet scam. This likely remains true even if engineers were aware (and they have been) that their existing guardrails were unlikely to hold.

That’s because negligence law, the only pathway to liability that is clearly available under current law, requires the plaintiff to prove a breach of the defendant’s duty of reasonable care. And the scope of the breach inquiry is typically quite narrow.

If I run someone over with my SUV, the court doesn’t ask questions like “Was it reasonable for me to be driving such a large vehicle, given my transportation needs?” or “Did the benefits to me of taking this particular car trip exceed the risks that my driving imposed on other road users?” Instead, it only asks questions like, “Was I speeding?” “Was I texting?” and “Was I drunk?”

The same should be true for AI harm cases. The reasonableness of the underlying activity of training and deploying advanced AI systems with poorly understood capabilities and goals is likely to fall outside of the scope of the negligence inquiry. What can we do about this? Luckily, there is a tool available: strict liability for abnormally dangerous activities.

If I’m blasting an area with dynamite and someone is struck by a piece of flying rubble or I keep a pet tiger and it bites someone, I am liable even if I took reasonable precautions. That is, when I engage in an abnormally dangerous activity, I am liable for all foreseeable harms that arise from the inherently dangerous nature of that activity. Courts can and should treat creating powerful new intelligent systems that we do not understand as such an activity.

But there is one other problem. Some potential AI harms are so large that it would be impossible to enforce a liability judgment if they occurred. There is no one to bring a lawsuit over human extinction and no way to do so if a catastrophe is so disruptive that the legal system is no longer functional.

But even short of these sorts of apocalyptic scenarios, an AI system might do so much harm — say by killing one million people — that any compensatory damages award would push the company that created the system into bankruptcy and blow through any plausible insurance policy.

“The development of frontier artificial intelligence systems should be legally treated like housing wild animals or blasting dynamite, because its consequences can also be abnormally dangerous.”

At this point, the company has no further reason to care about additional increases in the potential harm they might cause, as it’s irrelevant to their financial interests. This is why direct liability for harm caused is an ineffective mechanism for regulating uninsurable risks.

Punitive Damages

Fortunately, there is another well-established legal tool available: punitive damages. If an AI system does some practically compensable harm, but it looks like it could have easily gone a lot worse, causing an uninsurable catastrophe, then the human or corporation responsible should be held liable not just for the harm they caused, but for the uninsurable risks they generated.

Imagine an AI system tasked with running a clinical trial for a drug that carries a significant risk of extreme side effects. The system struggles to recruit participants honestly. Instead of reporting these difficulties to its human overseers, the AI system manipulates people into participation. Some of those participants suffer terrible side effects from the drug and sue.

This case would qualify as a “near miss” for a much larger catastrophe. The drug company using the system probably didn’t want the AI to behave the way it did. That is, the system was misaligned with its users and creators — it engaged in acts of deceit and coercion that neither party intended.

For whatever reason, the AI, in this case, revealed its misalignment in a non-catastrophic fashion, and so is likely to be reprogrammed or shut down. But a more situationally aware system might pretend to serve its users faithfully until presented with an opportunity to achieve a more ambitious goal — breaking free of human control. If the system patiently bided its time in this way, the human toll might be far worse. This outcome is exemplary of the kind of risks AI developers and drug companies take on when deciding to train and deploy an AI system.

In this kind of “near miss” or “warning shot” case, compensatory damages — the ordinary damages in a tort lawsuit, intended to make the plaintiff whole — are inadequate. Since we can’t hold AI developers liable if their systems cause uninsurable catastrophes, we have to hold them liable for generating uninsurable risks. And we are best positioned to do that when their systems reveal themselves to be dangerous by causing smaller harms that are associated with similar larger risks.

These risks have been recognized by the leaders of the top AI developers in the world. Anthropic CEO Dario Amodei, OpenAI CEO Sam Altman, and DeepMind CEO Demis Hassabis all signed onto the San Francisco-based nonprofit Center for AI Safety’s statement last year that declared “mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”

Anthropic’s policy team also recently sent a letter to California lawmakers detailing their opposition to state Sen. Scott Wiener’s frontier AI regulation bill, as currently written. In their letter, Anthropic declares its support for a focus “on deterrence rather than pre-harm enforcement” by “holding companies responsible for causing actual catastrophes.” While I believe the letter substantially mischaracterizes several provisions of SB 1047 as currently written, I broadly agree with these sentiments.

Unfortunately, the liability policies Anthropic endorses in the letter are deeply inadequate. The company repudiated prescriptive standards requiring AI developers to implement specific safety techniques and pushed back on penalties or the destruction of core AI model functions if a company fails to follow the prescriptive standards.

Their proposal does allow that “if an actual catastrophic incident occurs, and a company’s [safety and security protocol] falls short of best practices or relevant standards, in a way that materially contributed to the catastrophe, then the developer should also share liability, even if the catastrophe was partly precipitated by a downstream actor.”

There are two major problems with this. First, AI alignment and safety are unsolved technical problems. Or, as Anthropic’s letter states in the letter: “AI safety is a nascent ﬁeld where best practices are the subject of original scientiﬁc research.” Precisely because AI safety involves unsolved technical problems, merely implementing known “best practices” and “relevant standards” is unlikely to be adequate to protect society from the risks posed by advanced artificial intelligence.

Consider an AI system that displays the capability to materially contribute to a major cybersecurity breach or to building a powerful bioweapon. If the developer of this model implements state-of-the-art alignment techniques and safeguards against its misuse, this still might not provide adequate protection against the model being used to shut down the power grid of a major city or launch a bioweapon attack.

“We need a balanced approach that allows society to capture the enormous potential benefits of AI, while minimizing the risks.”

We need a liability framework that forces AI developers to internalize these catastrophic risks — to treat the risks like they are threats to the company’s bottom line, not just to public safety — not merely to push them to implement “best practices” and relevant standards.”

The responsible course of action under such circumstances may be to refrain from deploying the model until state-of-the-art alignment techniques, safeguard designs, or tools for understanding the internal operations of AI systems, advance sufficiently such that implementing current best practices renders the deployment of the model reasonably safe.

Anthropic is right to worry that regulators are poorly positioned to write prescriptive rules that would strike the optimal balance between safety and innovation. Indeed, frontier AI developers, by dint of their superior technique expertise and inside knowledge of the capabilities and vulnerabilities of their systems, are far better positioned to do so.

But the way to get the key decision-makers at frontier AI companies to bring this knowledge to bear is to make AI companies pay for any harm they cause, regardless of whether they followed best practices. The best practices may simply not be good enough. One might object that this strict liability approach might excessively inhibit innovation.

But if we want the benefits of faster AI development to outweigh the risks, then AI developers and their customers (via higher prices) should be willing to continue rolling out new AI advances at roughly the same pace as they would without a strong liability regime.

It is only when the expected liability due to the risks associated with their system begins to rival those benefits that the expectation of liability would begin to inhibit AI innovation and deployment; this is precisely when society should want developers to exercise such caution.

But there’s a second, equally important, problem with Anthropic’s prescription that liability only be imposed “if an actual catastrophic incident occurs.” Recall that Anthropic’s Amodei signed onto the statement expressing concern that future AI systems may threaten human extinction. For obvious reasons, it would not be practical to impose liability on an AI developer whose system has already materially contributed to a completed extinction-level harm.

Indeed, for harms even well short of human extinction, society may be sufficiently disrupted that the legal system is no longer functional. And a simply “ordinary” sort of large catastrophe, say the death of one million people, the damages may be so large as to push the responsible parties into bankruptcy and exceed the plausible payout of any insurance policy.

To put a fine point on it, Anthropic’s preferred mode of AI risk governance, deterrence-based liability, cannot meaningfully address extinction-level risks (and uninsurable risks generally) merely by holding AI developers liable for the actual harms caused when their system materially contributes to a catastrophe.

That raises the question of how Anthropic and other leading AI developers who signed on to the Center for AI Safety’s statement would propose to regulate such risks. Indeed, it is concern over such insurable risks that has driven much of the interest in the sort of prescriptive regulation and pre-harm enforcement that Anthropic decries.

If Anthropic is serious about their stated concerns regarding existential risks and the shortcomings of prescriptive regulation, then the implication is obvious. Anthropic should embrace my proposal for punitive damages in cases where advanced AI systems materially contribute to practically compensable harm in such a way that indicates that the decision to train or deploy the system generated uninsurable risk.

This approach addresses Anthropic core concerns with SB 1047, that it relies on a combination of pre-harm enforcement and prescriptive rules, while still providing AI developers with meaningful incentives to protect humanity from the most extreme risks posed by their products.

A Political Alternative

One might reasonably ask why this approach has not been adopted in other areas like gun violence and the risks associated with nuclear power. These details differ across these and other cases, but the most general explanation is simple: politics.

Let’s take gun manufacturer liability. The Protection of Lawful Commerce in Arms Act, a federal law passed in 2005, protects firearms manufacturers and dealers from being held liable for crimes committed using their products. In essence, Congress has explicitly ensured liability is not used as a backdoor mechanism for gun regulation.

This is bound up in the broader politics of gun control in the U.S. That said, there are also some reasonable distinctions to be drawn between the situations presented by gun violence and AI risk.

“We need a liability framework that forces AI developers to internalize catastrophic risks — to treat the risks like they are threats to the company’s bottom line, not just to public safety.”

While both guns and AI systems are dual-use technologies, for example, guns can be used for hunting as well as self-defense and recreation — for guns, the undesirable uses are inherently bound up with the desirable uses. There is no existing way to manufacture a gun that can be used for home defense, but not murder. The same cannot be said for AI.

AI developers are actively working to create safeguards to prevent the misuse of their systems. That said, gun control advocates argue, quite plausibly, that the guns used in many mass shootings do not produce sufficient benefits for their users to justify their risks to society.

The problem is that Congress doesn’t agree on this issue. If enough lawmakers became convinced that the availability of AR-15s or bump stocks was bad for society, they could write a law to ban them. It is not clear why liability is a preferable mechanism for addressing the risks posed by such weapons.

What about nuclear power? Well, nuclear power plants are subject to liability in the event of radioactive releases, but that liability is capped under the Price-Anderson Act, passed in 1957. The law includes several mechanisms to expedite and facilitate the recovery of damages for victims of nuclear accidents while limiting the extreme risks for providers of nuclear energy.

This liability approach was paired with a set of prescriptive regulations that were strengthened by the Energy Reorganization Act of 1974. Collectively, this approach has almost entirely stifled nuclear energy development.

But whatever its merits in the nuclear energy context, it does not make sense for frontier AI. At least with nuclear energy, we know how to regulate power plants to ensure their safety. In fact, nuclear power has proven much safer than coal- or natural-gas-fired power plants, and comparably safe to wind and solar.

This is not the situation with frontier AI systems. The current techniques we have to ensure AI behaves in a manner aligned with human intentions and values are both imperfect and not readily applicable to systems more powerful than those that we have today. There is no set of rules a regulator can write to ensure safety, short of enforcing a global ban on the development of models beyond a certain capability threshold.

Given the inadequacy of prescriptive regulations, liability is an essential governance tool for AI safety. We need AI developers to treat risks of harm to third parties, including the low probability risk that their system causes catastrophic harm, like threats to their bottom line.

The Price-Anderson model, which caps the liability of nuclear power plants, only makes sense if you approach liability primarily as a means of ensuring compensation for harmed parties, rather than a mechanism for mitigating catastrophic risk.

The Limits Of Liability

That said, it is worth acknowledging the fundamental limits of liability as a means of limiting AI risk. First, liability is generally not an effective solution for public goods problems — things like national defense or basic research — that are equally provided to all individuals.

Because corporate providers of public goods can’t easily capture much monetary benefit from what they generate, public goods tend to be underprovided by free markets. And so, in the world of AI, alignment and safety research is an example of a public good that most developers are likely to underinvest in, even if they expect to pay for the harms their systems might cause. This is why liability cannot address the full range of policy challenges advanced AI presents.

One potential solution is to offer prizes for major research breakthroughs and other forms of government subsidies. Private philanthropy already provides substantial funding for AI supporting alignment and safety research and can fill in the gaps left by government support going forward.

There may also be winner-take-all race dynamics, among companies trying to hit various key capabilities benchmarks, that generate other market distortions.

Second, liability is poorly positioned to mitigate risks that involve structural harms or long and complicated chains of causation. For example, no one would be entitled to bring a tort suit if AI systems engage in election interference or the spread of misinformation. While I am skeptical of the magnitude of these risks, it is still worth acknowledging that other policy tools would be needed to address them.

“Liability is the best tool for solving most of the policy problems presented by advances in artificial intelligence.”

Similarly, there are ways of contributing to AI risk that are too indirect for the liability system to handle. Open weights AI models, where the learning parameters of the systems are made public for developers to fine-tune, in particular, raise this sort of concern. If an open weights model is used, even in a somewhat modified form, to do harm, it is plausible that the model developer could be held liable.

But if the developer of an advanced AI system releases information about the system and its development that enables more AI developers to catch up with the leading labs, as Meta plausibly did with its recent release of Llama 3.1, this could induce frontier AI developers to rush out new products hastily and carelessly.

If the result of this causal chain is that a different lab’s system is directly implicated in causing harm, it would not be plausible to sue the lab that released the initial open weights model, even though this action materially increased the risk of the ultimate harm.

Liability also typically struggles with uninsurable risks — those that might be the source of catastrophes so large that it would not be practically feasible to enforce a compensatory damages award — if “warning shots” or smaller, insurable harms are unlikely.

For the prospect of punitive damages to motivate AI developers to take costly steps to reduce uninsurable risks, these near-miss cases of practically compensable harm associated with uninsurable risks must be expected to occur much more often than actual uninsurable catastrophes.

If there are 1,000 “warning shots” for every actual uninsurable catastrophe, then punitive damages can address harms as large as 1,000 times the maximum insurable value. But if such near-miss cases are much rarer — say, only 10 “warning shots” per uninsurable catastrophe — then even punitive damages are unlikely to incentivize AI developers to mitigate the most extreme risks.

If we come to have strong reasons to think that warning shots are relatively unlikely, then we may need to consider backstop approaches like the FDA-style approval regulation model that Emilia Javorsky proposed in this magazine.

Technical researchers could also work on laying the groundwork for implementing liability insurance and punitive damages. To set liability insurance requirements, regulators must estimate the maximum harm any particular AI system would likely cause. This would likely require advances in the science of model evaluation and risk analysis. Insurance companies will also depend on similar evaluations to help determine what systems they are willing to insure and how high they should set their premiums.

Finally, applying punitive damages in “warning shot” cases will require courts to estimate the size of the uninsurable risk an AI system might cause. Developing methods to assess these risks accurately should be a priority for technical AI safety researchers.

There is also a risk that a strong liability framework, like any robust form of regulation, would cause AI developers to relocate to less regulated jurisdictions. This phenomenon, known as regulatory arbitrage, is always a concern if any one jurisdiction imposes substantially stronger regulations than its trade partners.

While this risk is dampened somewhat by the fact that foreign AI developers would still be subject to liability if they harm people in countries with strong liability regimes, it may prove difficult to enforce those judgments if the developer lacks substantial assets in the country where the injuries occur (as is already the case for many cyber harms).

A series of international treaties establishing reciprocal enforcement of liability judgments reached by the other country’s courts would be a good first step, but stronger international coordination, more broadly, may also prove necessary.

Finally, government actors are less likely to be deterred by the threat of liability than profit-maximizing corporations. The principle of sovereign immunity generally shields governments from liability. But even if governments waive sovereign immunity when they are responsible for AI-related harms, government officials are typically more responsive to political than economic incentives. This suggests liability would be an ineffective tool in scenarios where the most powerful and dangerous AI systems are nationalized.

But even short of AI becoming outright nationalized, governments are likely to become major users of AI systems in the coming years. Therefore, we should not count on liability to effectively deter governments from abusing the new powers that advances in AI technology may afford them. Ideas like programming AI to follow the law with reasonable fidelity, are likely necessary constraints on the deployment of government-run AI.

“Given the inadequacy of prescriptive regulations, liability is an essential governance tool for AI safety.”

These caveats notwithstanding, liability is the best tool for solving most of the policy problems presented by advances in artificial intelligence. Strict liability gives AI developers the incentive to constantly strive to make their system safer, even in ways no regulator could imagine requiring.

Liability also naturally scales with the magnitude of the underlying risk. There is wide social disagreement about the likelihood of some of the most extreme potential harms from AI. Any set of prescriptive regulations would need to be premised, at least implicitly, on some estimate of those risks.

Reaching social consensus on the magnitude of AI risk is likely to prove a challenge, and any consensus that is risked may not track the underlying reality. Liability elegantly sidesteps those issues. Holding AI developers accountable under strict liability for the harm they cause makes sense regardless of the precise magnitude of the risks involved.

Most discussions of AI governance to date have focused on having governments make rules for AI developers to follow. But there are good reasons to think this task will be extremely difficult and costly. After all, leading AI companies with dedicated alignment teams don’t know how to make their systems safe. And the top talent is both extremely well-compensated and highly concentrated in the private sector.

It will likely be difficult for governments to develop in-house expertise, and, even if they do, the task of writing rules that ensure the safety of AI systems will be daunting. After all, AI alignment and safety are technical problems that remain unsolved. While I do think there is some scope for regulatory approaches to AI governance, the bulk of the risk mitigation load is best addressed with a simple principle: AI developers should pay for the damage they cause and the uninsurable risks that they generate.

While the courts can bring about this vision, this also requires waiting for cases of AI harm to occur and for plaintiffs to sue. Still, state legislators and voters — especially in states like California where there are ballot initiatives — could act immediately to hold AI developers accountable through new laws and measures.

In fact, the California Legislature recently passed SB 1047, a bill to establish safety and liability standards for frontier AI development (though it stops short of strict liability). It now awaits Gov. Gavin Newsom’s signature or veto.

Strict liability is a sensible starting point as it mimics the existing legal frameworks we use for other societal harms and is, by its very nature, a simple principle: If an AI developer’s AI breaks it — causes a foreseeable harm — then they should pay.

Your AI Breaks It? You Buy It.

Punitive Damages

A Political Alternative

The Limits Of Liability

More From Noema Magazine