When people hear “artificial intelligence,” many envision “big data.” There’s a reason for that: some of the most prominent AI breakthroughs in the past decade have relied on enormous data sets. Image classification made enormous strides in the 2010s thanks to the development of ImageNet, a data set containing millions of images hand sorted into thousands of categories. More recently GPT-3, a language model that uses deep learning to produce humanlike text, benefited from training on hundreds of billions of words of online text. So it is not surprising to see AI being tightly connected with “big data” in the popular imagination. But AI is not only about large data sets, and research in “small data” approaches has grown extensively over the past decade—with so-called transfer learning as an especially promising example.
Also known as “fine-tuning,” transfer learning is helpful in settings where you have little data on the task of interest but abundant data on a related problem. The way it works is that you first train a model using a big data set and then retrain slightly using a smaller data set related to your specific problem. For example, by starting with an ImageNet classifier, researchers in Bangalore, India, used transfer learning to train a model to locate kidneys in ultrasound images using only 45 training examples. Likewise, a research team working on German-language speech recognition showed that they could improve their results by starting with an English-language speech model trained on a larger data set before using transfer learning to adjust that model for a smaller data set of German-language audio.
Research in transfer learning approaches has grown impressively over the past 10 years. In a new report for Georgetown University’s Center for Security and Emerging Technology (CSET), we examined current and projected progress in scientific research across “small data” approaches, broken down in terms of five rough categories: transfer learning, data labeling, artificial data generation, Bayesian methods and reinforcement learning. Our analysis found that transfer learning stands out as a category that has experienced the most consistent and highest research growth on average since 2010. This growth has even outpaced the larger and more established field of reinforcement learning, which in recent years has attracted widespread attention.
Furthermore, transfer learning research is only expected to continue to grow in the near future. Using a three-year growth forecast model, our analysis estimates that research on transfer learning methods will grow the fastest through 2023 among the small data categories we considered. In fact, the growth rate of transfer learning is forecast to be much higher than the growth rate of AI research as a whole. This implies that transfer learning is likely to become more usable—and therefore more widely used—from here on out.
Small data approaches such as transfer learning offer numerous advantages over more data-intensive methods. By enabling the use of AI with less data, they can bolster progress in areas where little or no data exist, such as in forecasting natural hazards that occur relatively rarely or in predicting the risk of disease for a population set that does not have digital health records. Some analysts believe that, so far, we have applied AI more successfully to problems where data were most available. In this context, approaches like transfer learning will become increasingly important as more organizations look to diversify AI application areas and venture into previously underexplored domains.
Another way of thinking about the value of transfer learning is in terms of generalization. A recurring challenge in the use of AI is that models need to “generalize” beyond their training data—that is, to give good “answers” (outputs) to a more general set of “questions” (inputs) than what they were specifically trained on. Because transfer learning models work by transferring knowledge from one task to another, they are very helpful in improving generalization in the new task, even if only limited data were available.
Moreover, by using pretrained models, transfer learning can speed up training time and could also reduce the amount of computational resources needed to train algorithms. This efficiency is significant, considering that the process of training one large neural network requires considerable energy and can emit five times the lifetime carbon emissions of an average American car.
Of course, using pretrained models for new tasks works better in some cases than others. If the initial and target problems in a model are not similar enough, it will be difficult to use transfer learning effectively. This is problematic for some fields, such as medical imaging, where certain medical tasks have fundamental differences in data size, features and task specifications from natural image data sets such as ImageNet. Researchers are still learning about how useful information is transferred between models and how different model design choices hinder or facilitate successful transfer and fine-tuning. Hopefully, continued progress on these questions through academic research and practical experience will facilitate wider use of transfer learning over time.
AI experts such as Andrew Ng have emphasized the significance of transfer learning and have even stated that the approach will be the next driver of machine learning success in industry. There are some early signs of successful adoption. Transfer learning has been applied for cancer subtype discovery, video game playing, spam filtering, and much more.
Despite the surge in research, transfer learning has received relatively little visibility. While many machine learning experts and data scientists are likely familiar with it at this point, the existence of techniques such as transfer learning does not seem to have reached the awareness of the broader space of policy makers and business leaders in positions of making important decisions about AI funding and adoption.
By acknowledging the success of small data techniques like transfer learning—and allocating resources to support their widespread use—we can help overcome some of the pervasive misconceptions regarding the role of data in AI and foster innovation in new directions.