When and why would you use an LLM for text classification?

Sarah Packowski
5 min readNov 12, 2023

--

Smaller, faster, cheaper models can perform text classification very effectively. So why would it make sense to haul out a large language model?

Samples: https://github.com/spackows/MURAL-API-Samples

Existing method: Create a custom text classifier

Before generative models burst onto the scene, you could easily create a custom text classifier. The following .gif shows how to create one in just a few minutes using IBM Natural Language Understanding.

Creating a custom text classifier using IBM Natural Language Understanding
Creating a custom text classifier using IBM Natural Language Understanding

New method: Prompt a pre-trained LLM

For standard, common-sense classes, you can prompt an LLM to classify text based on only the class names:

Prompt Lab in IBM watsonx.ai
Classifying text based on class names in the Prompt Lab in IBM watsonx.ai

*Black text is what I typed, highlighted text is output from the model.

For more nuanced classes, you could classify text based on the class descriptions:

Prompt Lab in IBM watsonx.ai
Classifying text based on class descriptions in the Prompt Lab in watsonx.ai

You could also define the classes with a handful of examples in a few-shot prompt:

Prompt Lab in IBM watsonx.ai
Classifying text based on class exemplars in the Prompt Lab in watsonx.ai

For domain-specific, unusual, or esoteric classes, you could fine-tune an LLM with the same kind of training data you’d use to create a custom classifier. But if you’re going to the trouble of fine-tuning, that’s about as much effort and probably more expensive than creating a custom classifier, so let’s leave out the fine-tuning scenario for this discussion.

IBM watsonx.ai

The images above show a tool for experimenting with prompting LLMs, called Prompt Lab, in watsonx.ai.

Pros and Cons of each method

Custom classifier:

  • Cost — Usually cheaper to build, train, and use than an LLM
  • ✓ Speed — Can be faster than an LLM
  • ✓ Performance — For unusual classes, can be more successful
  • ✓ Ease of use — For unusual classes, easier to use than a few-shot prompt
  • ✗ Effort — Collecting labelled training data can be time-consuming
  • ✗ Maintenance — Keeping the model up-to-date as needs change can be a challenge

Pre-trained large language model:

  • ✓ Intuitive — Typing “classify the following message into one of three classes…” can feel more intuitive
  • ✓ Flexible — Using an LLM to perform this task is much more flexible (you can add a fourth class, change the class descriptions to experiment, or give special instructions in your prompt)
  • ✓ Effort — For common-sense classes, labelled training data isn’t needed
  • ✗ Cost — LLMs are expensive to build, pre-train, and use
  • ✗ Bias — Biases in pre-training data sets become encoded in LLMs, which means classification might be biased as well

When using an LLM makes the most sense

The flexibility of the LLM method makes it very attractive for cases where you need to do ad-hoc or experimental classification:

  • You only need to perform the task infrequently or for only one activity
  • You don’t have dozens of labelled examples already
  • The classes might change
  • You want the output to be more elaborate than just the class name

Example

Imagine you are a team lead and you are taking your team through a reflection activity. Your team has used a mural[1] to collect thoughts about what things the team should stop doing, continue doing, and start doing. Now, you need to cluster the sticky notes in the mural by category for further analysis.

[1] Mural is an online tool that is like a virtual whiteboard: you can draw shapes, stick notes, and move things around. It’s a fabulous tool for visually organizing ideas, designing solutions, and collaborating with teammates — in real time or asynchronously. I love using MURAL for team collaboration!

Using a large language model to classify sticky notes based on class descriptions or exemplars

Here are three Python notebooks demonstrating how to use the MURAL API and a large language model in watsonx.ai to classify team reflection ideas:

1. Use a large language model to classify sticky notes based on class names

Classifying sticky notes based on class name

2. Use a large language model to classify sticky notes based on class descriptions

Classifying sticky notes based on class descriptions
Classifying sticky notes based on class descriptions

3. Use a large language model to classify sticky notes based on class exemplars

Classifying sticky notes based on class exemplars

To use these sample notebooks:

Conclusion

When LLMs first became all the rage, I wondered why you would ever want to use an LLM to perform a task that smaller, faster, cheaper models could do as well or better. But now I have solutions in production that use custom classifiers, solutions that use LLMs, and even some solutions that use both for different steps. It’s just a case of weighing the pros and cons of each method for a given job.

--

--

Sarah Packowski

Design, build AI solutions by day. Experiment with input devices, drones, IoT, smart farming by night.