Choosing a large language model for your project is like buying running shoes

Sarah Packowski
4 min readNov 22, 2023

--

There are are some basic attributes to consider. And getting advice helps narrow the options. But in the end, you need to try them to know what will work best.

Image of a running shoe with wings and a laser
Generated using DALL-E. Prompt: “An advertisement for space-aged running shoes with with lazers (sic) and wings and teleportation in bold colors”

The first question everyone asks

At every large language model-related workshop, training session, or presentation I’ve given, people invariable ask:

  • “What model is the best one?”
  • “What model should I choose?”

The answer is: It depends. (But nobody likes that answer!)

Image of a shoe with wheels
Generated using DALL-E. Prompt: “An advertisement for magic running shoes that let you run faster than the speed of light hyper-realistic”

Buying running shoes

Let’s say you need to buy running shoes — for running, not for fashion. The wrong shoes can be uncomfortable and even contribute to injury.

Reading descriptions isn’t enough. If you look online, you’ll find countless different brands, styles, and features. You could read specifications, articles, blogs, and user reviews to make sense of it all. But it’s easy to become overwhelmed.

You need to narrow down the options. The best thing to do is find a store that specializes in running, has many different brands and styles of shoes in stock, and has trained staff (runners themselves) to help you. The salesperson will be able to narrow down your choices by asking some basic questions about your running, looking at your feet, and watching you run.

You have to run in them to really know. Once you’ve short-listed candidate shoes, you’ll have to run in them a bit to decide for yourself which ones are the best for you. *And there’s still a chance the ones you choose won’t end up working in your actual running conditions.

Over time, you’ll develop intuition for the characteristics of different brands. At that point, you’ll be able to guess which shoes are likely to work out for you on your own most of the time.

Beware the simple answer. If someone says they can tell you what shoes to buy without looking at your feet, knowing anything about your running, seeing you run, or letting you try running in the shoes first… you should probably not buy shoes from them!

Generated by DALL-E. Prompt: “Oil painting of a herd of 10 magic running shoes running a race by themselves with butterflies in nymph colors”

Choosing a large language model for your project

For this post, let’s assume you don’t want to deploy the model yourself, but that you want to use a models-as-a-service solution like IBM watsonx.ai.

Choosing a large language model for your project has much in common with choosing running shoes:

  • Descriptions on their own are overwhelming, yet insufficient. For each model in watsonx.ai, there is a great deal of information available about the model’s architecture, pre-training data, and fine-tuning. That information can be very useful for understanding the quirks of a given model, but trying to choose a model based only on reading the model cards and academic papers is tough.
  • You can narrow down the choices based on simple criteria. For example, if you want to generate Python code, you can easily eliminate models that weren’t pre-trained and fine-tuned with that programming language. Or, if you want to use a back-and-forth conversational prompt technique, you can focus on the models that were optimized for that.
  • You need to test the the short-listed models with real data. Remember too: New models come out pretty regularly. If you have a suite of tests you can easily run against a new model, you’ll know right away when switching to a new model is a good idea.
  • Choosing gets easier with experience. After several projects, you’ll get a feel for the character of each model and probably develop favourites.
  • Beware the simple answer. There is no one best model. They have different strengths and weaknesses. Be skeptical of anyone who says they can tell you which model is best for your project without knowing the details of your project.

Sample

The following Python notebook demonstrates how to submit a prompt to all the models in watsonx.ai and then visually compare the results:

prompt-multiple-models.ipynb

Comparing LLM output in a notebook
Comparing LLM output in a Python notebook in watsonx.ai

(To create a notebook in watsonx.ai from the sample notebook, use the “Raw” URL from GitHub. See: Creating notebooks)

The following video walks through using the sample notebook:

Comparing model output in a Python notebook

Conclusion

You can get almost any model to perform almost any task with enough prompt savvy. However, the quality of the results, the suitability for your project, will vary. Because it can be difficult to predict exactly how a given model will respond to your specific content, the only way to know which model will work best for a project is to test with real text from your project. But don’t take my word for it… try the models and see for yourself!

--

--

Sarah Packowski

Design, build AI solutions by day. Experiment with input devices, drones, IoT, smart farming by night.