Learning from fine-tuning data sets

Sarah Packowski
5 min readJul 18, 2023

--

Viewing prompts in data sets used to fine-tune a large language model can help you understand the behaviour of that model.

Few-shot prompting

A previous post described the “skeleton key” of large language model prompts: the follow-the-pattern-with-examples prompt. Properly called few-shot prompting.

Sample 1

*Black text is what I typed, highlighted text is output generated by the model.

Submitting a few-shot prompt to an LLM in IBM watsonx.ai
Submitting a few-shot prompt to an LLM in IBM watsonx.ai

IBM watsonx.ai

The image above shows a new tool for experimenting with prompting LLMs, called Prompt Lab, in watsonx.ai.

*If you want to try the samples from this post in the freeform editor of the watsonx.ai Prompt Lab, the prompt text, model, and parameter details for each sample are listed at the bottom of this post.

Zero-shot prompting

Specifying example input-output pairs in your prompt has drawbacks. For example, it’s a nuisance to have to type them when you’re experimenting with different prompts. Sometimes, the cost to use a model is based on the length of your prompt, so specifying those examples in your prompts costs more.

As it happens, you can take those same example input-output pairs and perform supervised training to improve how a model works for your use case. This is called fine-tuning.

When a model has been fine-tuned for your use case, you can successfully prompt that model without having to specify input-output pairs in your prompt. (When there are no example input-output pairs in your prompt text, that’s often called zero-shot prompting.)

Sample 2

When a prompt with no example input-output pairs is submitted to a model that has not been fine-tuned, the result isn’t useful (again, the black text is what I typed and the highlighted text is the output from the model):

Submitting a zero-shot prompt to a model that has not been fine-tuned
Submitting a zero-shot prompt to a model that has not been fine-tuned

Sample 3

When the same zero-shot prompt is submitted to a model that has been fine-tuned for this use case, the result is useful.

Submitting a zero-shot prompt to a model that has been fine-tuned
Submitting a zero-shot prompt to a model that has been fine-tuned

Hallucinations

So, what was going wrong with sample 2? When a model goes on a tangent or runs on, it’s sometimes called a hallucination.

But remember this from the previous post:

When you prompt a text-generating LLM, the model doesn’t answer your prompt, analyze your prompt, or respond to your prompt. The model continues your prompt text.

Without any examples to go on, the model just continues with the most likely next word, and the next, and so on.

Fine-tuning data sets

How do you know if a model has been fine-tuned for your use case? And even if it has, how do you know how to write a successful prompt?

For each model hosted in watsonx.ai, you can see details about how the model was created, pretrained, and fine-tuned. See: Models supported in watsonx.ai

The model in sample 3 above is: mt0-xxl-13b

That’s an open source model from Hugging Face, and here’s its model card on the Hugging Face website: mt0-xxl model card

That model card describes how the model was fine-tuned and it lists data sets used, such as this one: bigscience/xP3

bigscience/xP3 data set on the Hugging Face website
bigscience/xP3 data set on the Hugging Face website

Looking through the pairs of input prompt text and corresponding expected output in that data set is fascinating for several reasons:

  • You can see lots of different use cases
  • You can see lots of example prompt styles
  • You can better understand the behaviour of the model (you can literally see how it was trained to behave, given certain prompts)
  • You can pattern your prompts after the fine-tuning data set

Fine-tuning data set deep dive

The Hugging Face website has APIs you can use to work with data sets. So, I used the datasets-server API to download a preview (~100 prompt-output pairs) to my local computer, then formatted the data to make it easier to understand:

  • Formatted the results into an HTML file to navigate in a browser
  • Grouped similar use cases together
  • Went through the text of each prompt, highlighting the instruction part of the prompt
  • Removed the non-English examples

You can download the HTML file from here: xP3-data-set-preview.html

The following video walks through this file:

Deep dive into part of the bigscience/xP3 data set

Conclusion

Prompt engineering is part art and part science. It can be overwhelming and frustrating to learn how to prompt large language models, because there don’t seem to be very many concrete rules for how to write effective prompts. When prompting a fine-tuned model, the data sets used for the fine-tuning can be a helpful source of information.

Try the samples yourself

Use the details below to try the samples from this post yourself in the freeform editor of the Prompt Lab in watsonx.ai.

Sample 1: Follow a pattern

Model: gpt-neox-20b
Decoding: Greedy
Repetition penalty: 1.0
Max tokens: 40
Prompt text:
What are all the plants listed in the following story? "What a busy day! We planted a tree, trimmed some hedges, and pruned some flowers."
There were a tree, hedges, and flowers.

What are all the animals listed in the following story? "On our drive, we saw many birds. When we arrived at the park, many people were walking their dogs. One person even had their cat on a leash!"
They saw birds, dogs, and a cat.

What are all the foods listed in the following story? "This morning, I had eggs and toast for breakfast. Later, I had soup for lunch. My friends got together for pizza for dinner."
They had eggs, toast, soup, and pizza.

What are all the modes of transportation listed in the following story? "I drive my car to the airport, then took a plane to London. From there, I took a train to my family's house. The next day, we took a ferry to the island."

Sample 2: Zero-shot prompt with a model that is not fine-tuned

Model: gpt-neox-20b
Decoding: Greedy
Repetition penalty: 1.0
Max tokens: 40
Prompt text:
What are all the modes of transportation listed in the following story? "I drive my car to the airport, then took a plane to London. From there, I took a train to my family's house. The next day, we took a ferry to the island."

Sample 3: Zero-shot prompt with a fine-tuned model

Model: mt0-xxl-13b
Decoding: Greedy
Repetition penalty: 1.0
Max tokens: 40
Prompt text:
What are all the modes of transportation listed in the following story? "I drive my car to the airport, then took a plane to London. From there, I took a train to my family's house. The next day, we took a ferry to the island."

--

--

Sarah Packowski

Design, build AI solutions by day. Experiment with input devices, drones, IoT, smart farming by night.