DeepMind says that teaching machines to realistically mimic human language is more complicated than simply throwing increasing amounts of computing power at the problem, although it remains the dominant strategy in the field.
In recent years, much of the progress in building artificial intelligence (AI) has come from increasing their size and training, with more data on the largest computers available. But this makes AI costly, cumbersome and resource-hungry. A recent system created by Microsoft and Nvidia required more than a month to reach a supercomputer and about 4,500 high-power graphics cards to train at a cost of several million dollars.
In an effort to find alternatives, AI company DeepMind created a model that could search for information in a large database, just like a human would use a search engine. This avoids integrating all your knowledge during training. The company’s researchers say this strategy could create models that can compete with state-of-the-art equipment while being far less complex.
Linguistic AI took a big step forward last year with the release of the GPT-3, a model developed by US company OpenAI that surprised researchers with its ability to generate smooth text streams. Since then, models have grown bigger and bigger: GPT-3 used 175 billion parameters for its neural networks, while Microsoft and Nvidia’s most recent model, the Megatron-Turing Natural Language Generation, has 530 billion settings.
But the scale has its limits – the Megatron has managed to push the performance benchmarks barely above the GPT-3, despite the huge increase in parameters. On a benchmark, where AI is required to predict the last word of sentences, the accuracy of the GPT-3 was up to 86.4%, while that of Megatron was up to 87.2%.
Researchers at DeepMind initially studied the effects of scale on similar systems by creating six language models, ranging from 44 million parameters to 280 billion. They then assessed their abilities on a set of 152 different tasks and found that the scale improved abilities. The larger model outperformed the GPT-3 in about 82% of tests. On a common benchmark reading comprehension test, he scored 71.6, which is higher than the GPT-3’s 46.8 and the Megatron’s 47.9.
But the DeepMind team found that while some areas saw significant gains on a large scale, others, such as logical and mathematical reasoning, saw little gains. The company now claims that the only scale is not how it wants to achieve its goal of creating a realistic language model that can understand complex logical statements, and has released a model called the Retrieval-Enhanced Transformer (RETRO). One who searches for information instead of remembering it. , ,
RETRO has 7 billion parameters, 25 times less than GPT-3, but can access external databases of about 2000 billion information. DeepMind claims that the smaller model takes less time, energy and computing power to train, but can still compete with the performance of the GPT-3.
In a test against a standard language model with the same number of parameters, but without the ability to search for information, RETRO scored 45.5 on the benchmark test on accurate answers to natural language questions, while the control model got only 30.4 .
“Being able to search for things on the fly from a large knowledge base can often be helpful rather than having to remember everything,” explains Jack Rae at DeepMind. “The goal is simply to try and imitate human behavior from what it can see on the Internet.”
There are other advantages to this approach as well. While AI models are usually black boxes whose inner workings are a mystery, it is possible to see what external data RETRO is referring to. It might allow for a quote and some basic explanation of how he arrived at the particular results.
This makes it easy to update the model by simply adding in external data; For example, a traditional model trained in 2020 might answer a question about a Wimbledon winner by saying “Simona Halep,” but RETRO would be able to filter out new material and learn that “Ashleigh Barty” was a more contemporary answer. .
Samuel Bowman of New York University says the ideas behind RETRO are not necessarily new, but are important because of DeepMind’s influence in the AI field. “There’s still a lot we don’t know about how to safely and productively manage existing scale models, and it’s likely to be difficult with scale in many ways, even if it becomes easier.” In some cases. ,
One concern is that the high cost of large-scale AI could make it the preserve of large companies. “He thinks of them not trying to push the boundaries here, because it could strengthen the arms race,” Bowman explains.