April 15, 2024

What is fine-tuning?

By James Smith · 3 minute read

One of family members used to build voice recognition systems from scratch for large organisations, creating systems that would stitch together all the syllables of text. This was the way of the world for a long time - systems were built from scratch and for specific use-cases.

The journey of AI models has evolved significantly over the past two decades. There was a time when training a model from scratch for every new problem was the norm. The shift towards using pretrained models began gaining traction around 2013, notably when image models trained on datasets like ImageNet started dominating competitions.

Instead of training a model from scratch, using a pretrained model as a starting point and then fine-tuning it for a specific task became a common practice. This not only saved time and computational resources but also improved the performance of models, especially in image classification tasks

What are pre-trained models?

Pretrained AI, particularly LLMs, are trained on massive amounts of data, often in the order of trillions of tokens, which enables them to capture a broad understanding of human language and context.

This is in contrast to older machine learning approaches, which typically required more manual feature engineering and were not pretrained on such vast and diverse datasets.

The evolution of AI, particularly with the rise of LLMs and generative AI, has led to significant advancements in natural language understanding and generation, allowing these models to exhibit a higher level of language comprehension and generation compared to traditional machine learning approaches.

The concept of transfer learning

Transfer learning involves applying knowledge or skills acquired in one domain to solve problems in another domain.

In AI, this means leveraging a model's learning from one task (e.g., text generation) to perform another (e.g., language translation).

While transfer learning is a broader concept than pretraining, the two are intimately connected. Pretraining a model on a large dataset equips it with a base knowledge, which can then be transferred to various specific tasks.

The essence of transfer learning is captured elegantly by Josh Waitzkin in his book, "The Art of Learning." He describes how insights from chess can apply to other areas like sports or art, highlighting the fluidity of knowledge across different fields.

In the context of machine learning, large models like ChatGPT are trained on general tasks (like text generation) and then adapted to specific domains (like code generation or essay marking), showcasing transfer learning in action.

Transfer learning represents a fundamental shift in how we approach problem-solving in AI, enabling the more versatile, efficient, and resourceful use of pretrained models.

Fine-tuning pre- language Models

Understanding fine-tuning:

Fine-tuning involves adjusting a pretrained model to a particular task by training it further on a dataset specific to that task.

In the case of essay marking, while the model comes with a broad understanding of the English language from its initial training, fine-tuning aligns its capabilities more closely with the specific requirements or nuances of a particular task, such as a text response.

How does fine-tuning differ from other forms of learning?

Zero-shot learning:

Zero-shot learning (ZSL) is a technique that enables a pre-trained model to classify objects without receiving any specific training for those classes or, in other words, it’s a stab in the dark. When you ask ChatGPT a question without any examples, and it replies with a response, then this is an example of zero-shot learning. The effectiveness of this method is very dependent on crafting the best possible prompt.

One-shot and few-shot learning:

This type of learning involves showing the model one or a few examples of the desired task, such as by showing the model some examples of what you would would like to see and then asking it to generate a new version that closely mimics it. Retrieval-augmented generation (RAG) systems can help facilitate such approaches.

When should you opt for fine-tuning?

Fine-tuning is ideal when the task requires a level of specificity or precision beyond what the pretrained model offers.

Fine-tuning improves on few-shot learning by training on many more examples than can fit in the prompt, letting you achieve better results on a wide number of tasks. - OpenAI

The decision to fine-tune should consider the effort involved in gathering training data and the potential improvement in performance for the specific use case.

As AI continues to develop, the ability to fine-tune models will become increasingly important, providing a pathway to more sophisticated, efficient, and personalised AI solutions.