The Looming Problem of Computation Capacity for Deep Learning Algorithms

Deep Learning Algorithms

Deep learning is a subset of machine learning algorithms that uses multiple layers of artificial neural networks (ANN) to extract latent features from the input data, eliminating the necessity of manual feature engineering. ANNs are the mathematical model of our human brain.

Deep learning models are popular because they can achieve optimal performance given enough training data and compute power. Also, the growing number of frameworks for model development and deployment has made it easier for any IT company to integrate deep learning models in their businesses. Spell is a popular deep learning model serving platform that makes model deployment and maintenance easier for busy engineers.

But what will happen when we can’t make any more improvements in terms of computation power? This has become a looming problem for deep learning algorithms. To make the context clear, let’s start by looking at some of the recent state-of-the-art models and the resources dedicated to training some of the most comprehensive deep learning models.

BERT

BERT stands for Bidirectional Encoder Representation from Transformers. It was developed by Google researchers and took the natural language processing (NLP) community by storm. It was able to obtain high-quality results in a wide range of NLP tasks like question answering, natural language inference, sentiment classification, etc. BERT is a huge model with two variants. BERT large has around 345 million parameters, whereas BERT base has around 110 million parameters. The BERT base was trained on four cloud TPUs for four days, whereas the BERT large was trained on 16 cloud TPUs for four days. The total price of one-time pre-training for the BERT large model can be estimated to be around $7,000.

GPT 2

GPT-2 is an extensive language model built by OpenAI recently that can generate realistic text paragraphs. Despite the lack of task-specific training data, the model performs admirably in various linguistic tasks, including machine translation, question answering, reading comprehension, and summarization. GPT has around 1.5 billion parameters. The GPT-2 model was trained with 256 Google Cloud TPU v3 cores, which cost $256 per hour.

GPT 3

GPT-3 is the third iteration of GPT models developed by OpenAI. GPT-3’s full version has a capacity of 175 billion machine learning parameters. The high quality of the text generated by GPT-3 makes it difficult to determine whether or not a human wrote it.

GPT-3 showed that a language model with adequate training data could solve NLP tasks it has never seen before.

GPT-3 investigates the model as a comprehensive solution for various downstream operations that do not require fine-tuning. Training GPT-3 on a Tesla V100, the fastest GPU on the market, is estimated to take 355 years, and training GPT-3 on the cheapest GPU cloud service would cost US$4.6 million.

The Price of Progress

In 2018, OpenAI found that since 2012, the amount of computational power used to train the state-of-the-art AI models has doubled every 3.4 months. This significant increase in the resources required demonstrates how expensive the field’s achievements have become. The graph below is a logarithmic scale. The difference becomes massive if we view it on a linear scale.

Researchers from MIT have recently published an article that warn deep learning is approaching its computational limits.

They’ve demonstrated that deep learning is computationally expensive by design, not by accident. Its flexibility, which allows it to predict a broad range of phenomenons and beat expert models, also makes it far more computationally expensive. Progress in training models has been dependent on huge increases in the amount of processing power used in various areas of deep learning.

According to researchers, three years of algorithmic development equates to a tenfold gain in computing capacity. A decade earlier, the ‘AI winter’ came to an end, and new benchmarks for computer performance on a wide range of activities have been set thanks to the surge in processing power employed for deep learning models.

Power Limitations

Deep learning’s voracious thirst for processing power has once again placed a limit on how far it can enhance performance in its current form, especially at a time when hardware performance is stagnating. These computing restrictions will likely force machine learning to use less computationally intensive strategies than deep learning. This can be taken as the looming problem of computation capacity for deep learning algorithms.

Tags: Deep Learning Algorithms