This year, GPT-3, a large language model capable of understanding text, responding to questions and generating new writing examples, has drawn international media attention. The model, released by OpenAI, a California-based nonprofit that builds general-purpose artificial intelligence systems, has an impressive ability to mimic human writing, but just as notable is its massive size. To build it, researchers collected 175 billion parameters (a type of computational unit) and more than 45 terabytes of text from Common Crawl, Reddit, Wikipedia and other sources, then trained it in a process that occupied hundreds of processing units for thousands of hours.
GPT-3 demonstrates a broader trend in artificial intelligence. Deep learning, which has in recent years become the dominant technique for creating new AIs, uses enormous amounts of data and computing power to fuel complex, accurate models. These resources are more accessible for researchers at large companies and elite universities. As a result, a study from Western University suggests, there has been a "de-democratization" in AI: the number of researchers able to contribute to cutting-edge developments is shrinking. This narrows the pool of people who are able to define the research directions for this pivotal technology, which has social implications. It may even be contributing to some of the ethical challenges facing AI development, including privacy invasion, bias and the environmental impact of large models.
To combat these problems, researchers are trying to figure out how to do more with less. One such recent advance is called “less than one”–shot learning (LO-shot learning), developed by Ilia Sucholutsky and Matthias Schonlau from the University of Waterloo.[Office1] [RK2] The principle behind LO-shot learning is that it should be possible for an AI to learn about objects in the world without being fed an example of each one. This has been a major hurdle for contemporary AI systems, which often require thousands of examples to learn to distinguish objects. Humans, on the other hand, are often able to abstract away from existing examples in order to recognize new never-before-seen items. For example, when shown different shapes, a child is able to easily distinguish between the examples and to recognize the relationships between what they were shown and new shapes.
The team first introduced this sort of learning through a process called soft distillation. An image database maintained by the National Institute for Standards and Technology, called MNIST, which contains 60,000 examples of written digits from 0 to 9, was distilled down to five images that blended features of the various numbers. After being shown only those five examples, the University of Waterloo system was able to accurately classify 92 percent of the remaining images in the database.
In their latest paper, the team has extended this principle to show that, theoretically, LO-shot techniques allow AIs to potentially learn to distinguish thousands of objects given a small data set of even two examples. This is a great improvement on traditional deep-learning systems, in which the demand for data grows exponentially with the need to distinguish more objects. Currently, LO-shot’s small data sets need to be carefully engineered to distill the features of the various classes of objects. But Sucholutsky is seeking to further develop this work by looking at the relationships between objects already captured in existing small data sets.
Allowing AIs to learn with considerably less data is important for several reasons. First, it better encapsulates the actual process of learning by forcing the system to generalize to classes it has not seen. By building in abstractions that capture the relationships between objects, this technique also reduces the potential for bias. Currently, deep-learning systems fall prey to bias arising from irrelevant features in the data they use to train. A well-known example of this problem is that AI classifies dogs as wolves when shown images of dogs in a snowy environment—because most images of wolves feature them near snow. Being able to zero in on relevant aspects of the image would help prevent these mistakes. Reducing data needs thus makes these systems less liable to this sort of bias.
Next, the less extensive the data one needs to use, the less incentive exists to surveil people to build better algorithms. For example, soft distillation techniques have already impacted medical AI research, which trains its models using sensitive health information. In one recent paper, researchers used soft distillation in diagnostic x-ray imagery based on a small, privacy-preserving data set.
Finally, allowing AIs to learn with less plentiful data helps to democratize the field of artificial intelligence. With smaller AIs, academia can remain relevant and avoid the risk of professors being poached by industry. Not only does LO-shot learning make the barriers to entry lower by reducing training costs and lowering data requirements, but it also provides more flexibility for users to create novel data sets and experiment with new approaches. By reducing the time spent on data and architecture engineering, researchers looking to leverage AI can spend more time focusing on the practical problems they are aiming to solve.