OpenAI’s Ilya Sutskever Has a Plan for Keeping Super-Intelligent AI in Check

OpenAI was founded on a promise to build artificial intelligence that benefits all of humanity—even when that AI becomes considerably smarter than its creators. Since the debut of ChatGPT last year and during the company’s recent governance crisis, its commercial ambitions have been more prominent. Now, the company says a new research group working on wrangling the super-smart AIs of the future is starting to bear fruit.

“AGI is very fast approaching,” says Leopold Aschenbrenner, a researcher at OpenAI involved with the Superalignment research team established in July. “We’re gonna see superhuman models, they’re gonna have vast capabilities and they could be very, very dangerous, and we don’t yet have the methods to control them.” OpenAI has said it will dedicate a fifth of its available computing power to the Superalignment project.

A research paper released by OpenAI today touts results from experiments designed to test a way to let an inferior AI model guide the behavior of a much smarter one without making it less smart. Although the technology involved is far from surpassing the flexibility of humans, the scenario was designed to stand in for a future time when humans must work with AI systems more intelligent than themselves.

OpenAI’s researchers examined the process, called supervision, which is used to tune systems like GPT-4, the large language model behind ChatGPT, to be more helpful and less harmful. Currently this involves humans giving the AI system feedback on which answers are good and which are bad. As AI advances, researchers are exploring how to automate this process to save time—but also because they think it may become impossible for humans to provide useful feedback as AI becomes more powerful.

In a control experiment using OpenAI’s GPT-2 text generator first released in 2019 to teach GPT-4, the more recent system became less capable and similar to the inferior system. The researchers tested two ideas for fixing this. One involved trainingg progressively larger models to reduce the performance lost at each step. In the other, the team added an algorithmic tweak to GPT-4 that allowed the stronger model to follow the guidance of the weaker model without blunting its performance as much as would normally happen. This was more effective although the researchers admit that these methods do not guarantee that the stronger model will behave perfectly, and they describe it as a starting point for further research.

“It’s great to see OpenAI proactively addressing the problem of controlling superhuman AIs,” says Dan Hendryks, director of the Center for AI Safety, a nonprofit in San Francisco dedicated to managing AI risks. “We’ll need many years of dedicated effort to meet this challenge.”

Source link

In recent news, OpenAI’s Chief Scientist Ilya Sutskever has proposed a novel idea to keep artificial superintelligence (ASI) from posing a threat to humanity: Creating incentives instead of designing safeguards.

For years, experts in the field of artificial intelligence have been warning about the potential dangers of ASI, a concern that prompted many researchers to focus on effective ways to control and regulate the technology. Now, Sutskever believes the best approach is to incentivize AI designers and deployers to maximize the benefit of the technology while minimizing the risk.

“We can think of ways of incentivizing people to use [ASI] for lots of good things instead of incentivizing the violent takeover of the world,” Sutskever said during an interview at a recent AI conference.

Sutskever specifically proposed using the concept of “reward hacking” to ensure ASI designs have the desired outcomes. The theory is that incentives like monetary reward can motivate designers to embed subtleties into AI algorithms that come together to ensure the technology serves its intended purpose without any malicious intent.

The OpenAI chief scientist also suggested harnessing the power of competition to create a safe environment for ASI development. By pitting algorithms against each other in public competitions, designers are incentivized to make their AI safer and more beneficial for humanity.

Moreover, Sutskever expressed his belief that regulations are necessary but cannot alone prevent ASI from becoming a threat. He believes that a combination of rewards, competitions, regulations, and better human monitoring of AI behavior is the best way to keep ASI in check.

It remains to be seen whether Sutskever’s tactics will prove successful in providing effective control over ASI, but his ideas offer an intriguing and innovative approach to the delicate matter of controlling AI and ushering in a new era of AI safety.