"If a worker wants to do his job well, he must first sharpen his tools." - Confucius, "The Analects of Confucius. Lu Linggong"
Front page > AI > Beyond LLMs: Here\'s Why Small Language Models Are the Future of AI

Beyond LLMs: Here\'s Why Small Language Models Are the Future of AI

Published on 2024-11-01
Browse:391

Large language models (LLMs) hit the scene with the release of Open AI's ChatGPT. Since then, several companies have also launched their LLMs, but more companies are now leaning towards small language models (SLMs).

SLMs are gaining momentum, but what are they, and how do they differ from LLMs?

What Is a Small Language Model?

A small language model (SLM) is a type of artificial intelligence model with fewer parameters (think of this as a value in the model learned during training). Like their larger counterparts, SLMs can generate text and perform other tasks. However, SLMs use fewer datasets for training, have fewer parameters, and require less computational power to train and run.

SLMs focus on key functionalities, and their small footprint means they can be deployed on different devices, including those that don't have high-end hardware like mobile devices. For example, Google's Nano is an on-device SLM built from the ground up that runs on mobile devices. Because of its small size, Nano can run locally with or without network connectivity, according to the company.

Beyond LLMs: Here\'s Why Small Language Models Are the Future of AI

Besides Nano, there are many other SLMs from leading and upcoming companies in the AI space. Some popular SLMs include Microsoft's Phi-3, OpenAI's GPT-4o mini, Anthropic's Claude 3 Haiku, Meta's Llama 3, and Mistral AI's Mixtral 8x7B.

Other options are also available, which you might think are LLMs but are SLMs. This is especially true considering most companies are taking the multi-model approach of releasing more than one language model in their portfolio, offering both LLMs and SLMs. One example is GPT-4, which has various models, including GPT-4, GPT-4o (Omni), and GPT-4o mini.

Small Language Models vs. Large Language Models

While discussing SLMs, we can't ignore their big counterparts: LLMs. The key difference between an SLM and an LLM is the model size, which is measured in terms of parameters.

As of this writing, there's no consensus in the AI industry on the maximum number of parameters a model should not exceed to be considered an SLM or the minimum number required to be considered an LLM. However, SLMs typically have millions to a few billions of parameters, while LLMs have more, going as high as trillions.

For example, GPT-3, which was released in 2020, has 175 billion parameters (and the GPT-4 model is rumored to have around 1.76 trillion), while Microsoft's 2024 Phi-3-mini, Phi-3-small, and Phi-3-medium SLMs measure 3.8, 7, and 14 billion parameters, respectively.

Beyond LLMs: Here\'s Why Small Language Models Are the Future of AI

Another differentiating factor between SLMs and LLMs is the amount of data used for training. SLMs are trained on smaller amounts of data, while LLMs use large datasets. This difference also affects the model's capability to solve complex tasks.

Due to the large data used in training, LLMs are better suited for solving different types of complex tasks that require advanced reasoning, while SLMs are better suited for simpler tasks. Unlike LLMs, SLMs use less training data, but the data used must be of higher quality to achieve many of the capabilities found in LLMs in a tiny package.

Why Small Language Models Are the Future

For most use cases, SLMs are better positioned to become the mainstream models used by companies and consumers to perform a wide variety of tasks. Sure, LLMs have their advantages and are more suited for certain use cases, such as solving complex tasks. However, SLMs are the future for most use cases due to the following reasons.

1. Lower Training and Maintenance Cost

Beyond LLMs: Here\'s Why Small Language Models Are the Future of AI

SLMs need less data for training than LLMs, which makes them the most viable option for individuals and small to medium companies with limited training data, finances, or both. LLMs require large amounts of training data and, by extension, need huge computational resources to both train and run.

To put this into perspective, OpenAI's CEO, Sam Altman, confirmed it took them more than $100 million to train GPT-4 while speaking at an event at MIT (as per Wired). Another example is Meta's OPT-175B LLM. Meta says it was trained using 992 NVIDIA A100 80GB GPUs, which cost roughly $10,000 per unit, as per CNBC. That puts the cost at approximately $9 million, without including other expenses like energy, salaries, and more.

With such figures, it's not viable for small and medium companies to train an LLM. In contrast, SLMs have a lower barrier to entry resource-wise and cost less to run, and thus, more companies will embrace them.

2. Better Performance

Beyond LLMs: Here\'s Why Small Language Models Are the Future of AI

Performance is another area where SLMs beat LLMs due to their compact size. SLMs have less latency and are more suited for scenarios where faster responses are needed, like in real-time applications. For example, a quicker response is preferred in voice response systems like digital assistants.

Running on-device (more on this later) also means your request doesn't have to make a trip to online servers and back to respond to your query, leading to faster responses.

3. More Accurate

Beyond LLMs: Here\'s Why Small Language Models Are the Future of AI

When it comes to generative AI, one thing remains constant: garbage in, garbage out. Current LLMs have been trained using large datasets of raw internet data. Thus, they might not be accurate in all situations. This is one of the problems with ChatGPT and similar models and why you shouldn't trust everything an AI chatbot says. On the other hand, SLMs are trained using higher-quality data than LLMs and thus have higher accuracy.

SLMs can also be fine-tuned further with focused training on specific tasks or domains, leading to better accuracy in those areas compared to larger, more generalized models.

4. Can Run On-Device

Beyond LLMs: Here\'s Why Small Language Models Are the Future of AI

SLMs need less computational power than LLMs and thus are ideal for edge computing cases. They can be deployed on edge devices like smartphones and autonomous vehicles, which don't have large computational power or resources. Google's Nano model can run on-device, allowing it to work even when you don't have an active internet connection.

This ability presents a win-win situation for both companies and consumers. First, it's a win for privacy as user data is processed locally rather than sent to the cloud, which is important as more AI is integrated into our smartphones, containing nearly every detail about us. It is also a win for companies as they don't need to deploy and run large servers to handle AI tasks.

SLMs are gaining momentum, with the largest industry players, such as Open AI, Google, Microsoft, Anthropic, and Meta, releasing such models. These models are more suited for simpler tasks, which is what most of us use LLMs for; hence, they are the future.

But LLMs aren't going anywhere. Instead, they will be used for advanced applications that combine information across different domains to create something new, like in medical research.

Release Statement This article is reproduced at: https://www.makeuseof.com/why-small-language-models-are-the-future-of-ai/ If there is any infringement, please contact [email protected] to delete it
Latest tutorial More>

Disclaimer: All resources provided are partly from the Internet. If there is any infringement of your copyright or other rights and interests, please explain the detailed reasons and provide proof of copyright or rights and interests and then send it to the email: [email protected] We will handle it for you as soon as possible.

Copyright© 2022 湘ICP备2022001581号-3