Nathan Gardels: Generative AI is exponentially climbing the capability ladder. Where are we now? Where is it going? How fast is it going? When do you stop it, and how?
Eric Schmidt: The key thing that’s going on now is we’re moving very quickly through the capability ladder steps. There are roughly three things going on now that are going to profoundly change the world very quickly. And when I say very quickly, the cycle is roughly a new model every year to 18 months. So, let’s say in three or four years.
The first pertains to the question of the “context window.” For non-technical people, the context window is the prompt that you ask. That context window can have a million words in it. And this year, people are inventing a context window that is infinitely long. This is very important because it means that you can take the answer from the system and feed it back in and ask it another question.
Say I want a recipe to make a drug. I ask, “What’s the first step?” and it says, “Buy these materials.” So, then you say, “OK, I bought these materials. Now, what’s my next step?” And then it says, “Buy a mixing pan.” And then the next step is “How long do I mix it for?”
That’s called chain of thought reasoning. And it generalizes really well. In five years, we should be able to produce 1,000-step recipes to solve really important problems in medicine and material science or climate change.
The second thing going on presently is enhanced agency. An agent can be understood as a large language model that can learn something new. An example would be that an agent can read all of chemistry, learn something about it, have a bunch of hypotheses about the chemistry, run some tests in a lab and then add that knowledge to what it knows.
These agents are going to be really powerful, and it’s reasonable to expect that there will be millions of them out there. So, there will be lots and lots of agents running around and available to you.
The third development already beginning to happen, which to me is the most profound, is called “text to action.” You might say to AI, “Write me a piece of software to do X” and it does. You just say it and it transpires. Can you imagine having programmers that actually do what you say you want? And they do it 24 hours a day? These systems are good at writing code, such as languages like Python.
Put all that together, and you’ve got, (a) an infinite context window, (b) chain of thought reasoning in agents and then (c) the text-to-action capacity for programming.
What happens then poses a lot of issues. Here we get into the questions raised by science fiction. What I’ve described is what is happening already. But at some point, these systems will get powerful enough that the agents will start to work together. So your agent, my agent, her agent and his agent will all combine to solve a new problem.
Some believe that these agents will develop their own language to communicate with each other. And that’s the point when we won’t understand what the models are doing. What should we do? Pull the plug? Literally unplug the computer? It will really be a problem when agents start to communicate and do things in ways that we as humans do not understand. That’s the limit, in my view.
Gardels: How far off is that future?
Schmidt: Clearly agents with the capacity I’ve described will occur in the next few years. There won’t be one day when we realize “Oh, my God.” It is more about the cumulative evolution of capabilities every month, every six months and so forth. A reasonable expectation is that we will be in this new world within five years, not 10. And the reason is that there’s so much money being invested in this path. There are also so many ways in which people are trying to accomplish this.
You have the big guys, the large so-called frontier models at OpenAI, Microsoft, Google and Anthropic. But you also have a very large number of players who are programming at one level lower at much less or lower costs, all iterating very quickly.
Gardels: You say “pull the plug.” How and when do you pull the plug? But even before you pull the plug, you know you are already in chain of thought reasoning, and you know where that leads. Don’t you need to regulate at some point along the capability ladder before you get where you don’t want to go?
Schmidt: A group of us from the tech world have been working very closely with the governments in the West on just this set of questions. And we have started talking to the Chinese, which of course, is complicated and takes time.
At the moment, governments have mostly been doing the right thing. They’ve set up trust and safety institutes to learn how to measure and continuously monitor and check ongoing developments, especially of frontier models as they move up the capability ladder.
So as long as the companies are well-run Western companies, with shareholders and exposure to lawsuits, all that will be fine. There’s a great deal of concern in these Western companies about the liability of doing bad things. It is not as if they wake up in the morning saying let’s figure out how to hurt somebody or damage humanity. Now, of course, there’s the proliferation problem outside the realm of today’s largely responsible companies. But in terms of the core research, the researchers are trying to be honest.
Gardels: By specifying the Western companies, you’re implying that proliferation outside the West is where the danger is. The bad guys are out there somewhere.
Schmidt: Well, one of the things that we know, and it’s always useful to remind the techno-optimists in my world, is that there are evil people. And they will use your tools to hurt people.
The example that epitomizes this is facial recognition. It was not invented to constrain the Uyghurs. You know, the creators of it didn’t say we’re going to invent face recognition in order to constrain a minority population in China, but it’s happening.
All technology is dual use. All of these inventions can be misused, and it’s important for the inventors to be honest about that. In open-source and open-weights models the source code and the weights in models [the numbers used to determine the strength of different connections] are released to the public. Those immediately go throughout the world, and who do they go to? They go to China, of course, they go to Russia, they go to Iran. They go to Belarus and North Korea.
When I was most recently in China, essentially all of the work I saw started with open-source models from the West and was then amplified.
So, it sure looks to me like these leading firms in the West I’ve been talking about, the ones that are putting hundreds of billions into AI, will eventually be tightly regulated as they move further up the capability ladder. I worry that the rest will not.
Look at this problem of misinformation and deepfakes. I think it’s largely unsolvable. And the reason is that code-generated misinformation is essentially free. Any person — a good person, a bad person — has access to them. It doesn’t cost anything, and they can produce very, very good images. There are some ways regulation can be attempted. But the cat is out of the bag, the genie is out of the bottle.
That is why it is so important that these more powerful systems, especially as they get closer to general intelligence, have some limits on proliferation. And that problem is not yet solved.
Gardels: One thing that worries Fei-Fei Li of the Stanford Institute on Human-Centered AI is the asymmetry of research funding between the Microsofts and Googles of the world and even the top universities. As you point out, there are hundreds of billions invested in compute power to climb up the capability ladder in the private sector, but scarce resources for safe development at research institutes, no less the public sector.
Do you really trust these companies to be transparent enough to be regulated by government or civil society that has nowhere near the same level of resources and ability to attract the best talent?
Schmidt: Always trust, but verify. And the truth is, you should trust and you should also verify. At least in the West, the best way to verify is to use private companies that are set up as verifiers because they can employ the right people and technology.
In all of our industry conversations, it’s pretty clear that the way it will really work is you’ll end up with AI checking AI. It’s too hard for human monitoring alone.
Think about it. You build a new model. Since it has been trained on new data, how do you know what it knows? You can ask it all the previous questions. But what if the agent has discovered something completely new, and you don’t think about it? The systems can’t regurgitate everything they know without a prompt, so you have to ask them chunk by chunk by chunk. So, it makes perfect sense that an AI itself would be the only way to police that.
Fei-Fei Li is completely correct. We have the rich private industry companies. And we have the poor universities who have incredible talent. It should be a major national priority in all of the Western countries to get basic research funding for hardware into the universities.
If you were a research physicist 50 years ago, you had to move to where the cyclotrons [a type of particle accelerator] were because they were really hard to build and expensive — and they still are. You need to be near a cyclotron to do your work as a physicist.
We never had that in software, our stuff was capital-cheap, not capital-intensive. The arrival of heavy-duty training of AI models, which requires ever more complex and sophisticated hardware, is a huge economic change.
Companies are figuring this out. And the really rich companies, such as Microsoft and Google, are planning to spend billions of dollars because they have the cash. They have big businesses, the money’s coming in. That’s good. It is where the innovation comes from. Others, not least universities, can never afford that. They don’t have that capacity to invest in hardware, and yet they need access to it to innovate.
Gardels: Let’s discuss China. You accompanied Henry Kissinger on his last visit to China to meet President Xi Jinping with the mission of establishing a high-level group from both East and West to discuss on an ongoing basis both “the potential as well as catastrophic possibilities of AI.”
As chairman of the U.S. National Security Commission on AI you argued that the U.S. must go all out to compete with the Chinese, so we maintain the edge of superiority. At the same time with Kissinger, you are promoting cooperation. Where to compete? Where is it appropriate to cooperate? And why?
Schmidt: In the first place, the Chinese should be pretty worried about generative AI. And the reason is that they don’t have free speech. And so, what do you do when the system generates something that’s not permitted under the censorship regime?
Who or what gets punished for crossing the line? The computer, the user, the developer, the training data? It’s not at all obvious. What is obvious is that the spread of generative AI will be highly restricted in China because it fundamentally challenges the information monopoly of the Party-State. That makes sense from their standpoint.
There is also the critical issue of automated warfare or AI integration into nuclear command and control systems, as Dr. Kissinger and I warned about in our book, “The Age of AI.” And China faces the same concerns that we’ve been discussing as we move closer to general artificial intelligence. It is for these reasons that Dr. Kissinger, who has since passed away, wanted Xi’s agreement to set up a high-level group. Subsequent meetings have now taken place and will continue as a result of his inspiration.
Everyone agrees that there’s a problem. But we’re still at the moment with China where we’re speaking in generalities. There is not a proposal in front of either side that is actionable. And that’s OK because it’s complicated. Because of the stakes involved, it’s actually good to take time so each side can actually explain what they view as the problem and where there is a commonality of concern.
Many Western computer scientists are visiting with their Chinese counterparts and warning that, if you allow this stuff to proliferate, you could end up with a terrorist act, the misuse of AI for biological weapons, the misuse of cyber, as well as long-term worries that are much more existential.
For the moment, the Chinese conversations I’m involved in largely concern bio and cyber threats.
The long-term threat goes something like this: AI starts with a human judgment. Then there is something technically called “recursive self-improvement,” where the model actually runs on its own through chain of thought reasoning. It just learns and gets smarter and smarter. When that occurs, or when agent-to-agent interaction takes place, we have a very different set of threats, which we’re not ready to talk to anybody about because we don’t understand them. But they’re coming.
It’s going to be very difficult to get any actual treaties with China. What I’m engaged with is called a Track II dialogue, which means that it’s informal and a step away from official. It’s very hard to predict, by the time we get to real negotiations between the U.S. and China, what the political situation will be.
One thing I think both sides should agree on is a simple requirement that, if you’re going to do training for something that’s completely new on the AI frontier, you have to tell the other side that you’re doing it. In other words, a no-surprise rule.
Gardels: Something like the Open Skies arrangement between the U.S. and Soviets during the Cold War that created transparency of nuclear deployments?
Schmidt: Yes. Even now, when ballistic missiles are launched by any major nuclear powers, they are tracked and acknowledged so everyone knows where they are headed. That way, they don’t jump to a conclusion and think it’s targeted at them. That strikes me as a basic rule, right?
Furthermore, if you’re doing powerful training, there needs to be some agreements around safety. In biology, there’s a broadly accepted set of threat layers, Biosafety levels 1 to 4, for containment of contagion. That makes perfect sense because these things are dangerous.
Eventually, in both the U.S. and China, I suspect there will be a small number of extremely powerful computers with the capability for autonomous invention that will exceed what we want to give either to our own citizens without permission or to our competitors. They will be housed in an army base, powered by some nuclear power source and surrounded by barbed wire and machine guns. It makes sense to me that there will be a few of those amid lots of other systems that are far less powerful and more broadly available.
Agreement on all these things must be mutual. You want to avoid a situation where a runaway agent in China ultimately gets access to a weapon and launches it foolishly, thinking that it is some game. Remember, these systems are not human; they don’t necessarily understand the consequences of their actions. They [large language models] are all based on a simple principle of predicting the next word. So, we’re not talking about high intelligence here. We’re certainly not talking about the kind of emotional understanding in history we humans have.
So, when you’re dealing with non-human intelligence that does not have the benefit of human experience, what bounds do you put on it? That is a challenge for both the West and China. Maybe we can come to some agreements on what those are?
Gardels: Are the Chinese moving up the capability ladder as exponentially as we are in the U.S. with the billions going into generative AI? Does China have commensurate billions coming in from the government and/or companies?
Schmidt: It’s not at the same level in China, for reasons I don’t fully understand. My estimate, having now reviewed the scene there at some length, is that they’re about two years behind the U.S. Two years is not very far away, but they’re definitely behind.
There are at least four companies that are attempting to do large-scale model training, similar to what I’ve been talking about. And they’re the obvious big tech companies in China. But at the moment they are hobbled because they don’t have access to the very best hardware, which has been restricted from export by the Trump and now Biden administrations. Those restrictions are likely to get tougher, not easier. And so as Nvidia and their competitor chips go up in value, China will be struggling to stay relevant.
Gardels: Do you agree with the policy of not letting China get access to the most powerful chips?
Schmidt: The chips are important because they enable the kind of learning required for the largest models. It’s always possible to do it with slower chips, you just need more of them. And so, it’s effectively a cost tax for Chinese development. That’s the way to think about it. Is it ultimately dispositive? Does it mean that China can’t get there? No. But it makes it harder and means that it takes them longer to do so.
I don’t disagree with this strategy by the West. But I’m much more concerned about the proliferation of open source. And I’m sure the Chinese share the same concern about how it can be misused against their government as well as ours.
We need to make sure that open-source models are made safe with guardrails in the first place through what we call “reinforcement learning from human feedback” (RLHF) that is fine-tuned so those guardrails cannot be “backed out” by evil people. It has to not be easy to make open-source models unsafe once they have been made safe.
This interview has been edited for clarity and brevity.