Supervised fine-tuning
Supervised fine-tuning (SFT) is a machine learning technique that involves adapting a pre-trained large language model (LLM) to a specific downstream task like text classification, natural language inference, named entity recognition, and question-answering using labeled data. In SFT, the pre-trained LLM is fine-tuned on a dataset of input-output pairs, where the input is the prompt or instruction for the task, and the output is the desired response.
Like I'm a 10 year old explanation
Imagine you have a really clever robot friend who already knows loads about the world - they can read, write, and understand most things people say to them. This robot is like a pre-trained language model.
Now, let's say you want to teach this robot to become brilliant at one specific job - maybe you want them to be the best teacher's assistant ever, or to become amazing at answering science questions, or to be fantastic at sorting emails.
Even though your robot friend is already quite clever, they need some special training to get really good at that one particular job. So you give them lots and lots of practice examples:
- You show them a science question and the perfect answer
- Then another science question and its perfect answer
- Then hundreds more examples just like this
This special training with all these examples is called "supervised fine-tuning" or SFT for short. It's "supervised" because you're watching over the robot and showing them the right answers (like how a teacher supervises your learning). It's "fine-tuning" because you're making small adjustments to make them even better at this specific task.
After all this practice, your robot friend becomes absolutely brilliant at answering science questions - much better than they were before, even though they were already quite smart to begin with!
It's a bit like how you might already know how to play football, but if you want to become amazing at being a goalkeeper specifically, you'd need lots of special practice just for goalkeeping.