Natural language interfaces powered by large language models

Sarah Packowski
12 min readDec 14, 2024

--

RAG is good for so much more than question answering!

Natural language interfaces

Natural language (NL) interfaces enable users to interact with systems using natural language (as opposed to using a rigid command syntax or using buttons and fields of a graphical user interface.) The following video dives into different aspects of NL interfaces:

Introduction to natural language interfaces

Why building NL interfaces has been so difficult

NL interfaces are not new. Researchers have been studying NL interfaces and how people interact with these systems for more than 50 years! Here’s a fun list of NL interfaces over the years: Natural language user interface: History (Wikipedia)

If researchers have been working on NL interfaces for so long, why aren’t there NL interfaces everywhere?

Using traditional natural language processing (NLP) techniques to translate messy, ungrammatical NL input into precise commands that a machine can execute is incredibly difficult. Traditionally, the task has been approached by identifying part of speech, and extracting keywords and domain-specific entities by using hand-curated dictionaries. Then, meaning is teased from that information with grammar routines. Special functions handle challenges like anaphora and ellipsis. And finally, the extracted meaning is mapped to commands. More recently, components to manage conversation flow have been added to the mix as well. The following image shows some diagrams depicting the workflow in related research papers from 1970's — 2020's:

Traditional methods for translating messy, vague natural language input into precise instructions a machine can follow.
Traditional methods for translating messy, vague natural language input into precise instructions a machine can follow. (source1, source2, source3, source4)

Wow! That’s a lot of work. And it was only marginally successful for simple input with restricted domains.

Large language models to the rescue

Pre-trained large language models (LLMs) are the perfect tool for tackling the translate-unpredictable-input-to-output-in-a-desired-format challenge of building NL interfaces:

Adaptive classification — Unlike a traditional text classification model, LLMs can perform ad-hoc classification with little to no training data. The following prompt causes an LLM to classify user input based on only class descriptions (generated output in bold):

[ prompt ]

Classify the following user input as one of: "question", "action_request",
or "none"

question: The user input is asking a question.

action_request: The user input is an instruction to take an action.

none: The user input is not a question and not an instruction.

User input: "filter out everything after the end of January, 2024"
Class:


[ generated output ]

action_request

Logic — LLMs can perform pseudo logic (following patterns, really.) The following prompt causes an LLM to identify which part of a web app interface given user input is about (generated output in bold):

[ prompt ]

A web interface has the following elements:

Element: filter_icon_div
Description: Open filters menu button. Clicking this opens the filters menu

Element: filters_close_x
Description: Close filters menu button. Clicking this closes the filters menu

Element: first_date_input
Description: First date filter input. Setting this to a date causes
question-answer pairs before this date to be filtered out of the view

Element: final_date_input
Description: Final date filter input. Setting this to a date causes
question-answer pairs after this date to be filtered out of the view

...

Which element is the following user message about?
"filter out everything after the end of January, 2024"

Element:


[ generated output ]

final_date_input

Robust entity extraction — LLMs can extract entities without the need for exhaustive dictionaries. The following prompt causes an LLM to extract a date value from given user input (generated output in bold):

[ prompt ]

A web interface has the following elements:

...

Element: final_date_input
Description: Final date filter input. Setting this to a date causes
question-answer pairs after this date to be filtered out of the view

...

Which element is the following user message about?
"filter out everything after the end of January, 2024"

Element: final_date_input

What value should the element be set to?

A note about dates:
- Date values must always be specified like YYYY-MM-DD. For example,
'January 17th, 2022' would be: 2022-01-17. And '2020 Feb. 13th'
should be: 2020-02-13.
- If no year is given, assume the current year. The current year is
2024. So, January 17th would be: 2024-01-17 and Feb. 20th would
be: 2024-02-20

Value:


[ generated output ]

2024-01-31

Generating output in a desired format — The previous example also demonstrates the ability of LLMs to generate output in a format you specify in your prompt.

Translation — The examples above don’t deal with query syntax like SQL. But to an LLM, translating an NL query to SQL is not very different from translation from English to French.

Because of these advantages of LLMs, NL interface features are making their way into many applications.

But a problem is emerging

Teams who are keen to build NL interfaces are replacing GUI features with chatbots. Alas, when faced with the flashing cursor of a chatbot input, today’s users have much in common with users of UNIX command-line interfaces of old, asking: “What should I type here?”

Have user interfaces come full circle? Back to a blinking cursor?
Have user interfaces come full circle? Back to a blinking cursor?

Desperate teams are adding suggestions in their chatbot interface, like: “ask me about X” or “try searching for Y”. But there’s a better way to set NL interfaces up for success…

A simple method for building intuitive NL interfaces

Here’s a method for building NL interfaces that is simple to build and results in NL interfaces that are intuitive for users:

  1. Build a typical graphical user interface (GUI)
  2. Collect metadata about the elements in your GUI
  3. Create prompts to analyze user input:
    - Classify user input
    - Detect which GUI element is related to user input
    - Identify values in user input
  4. Add application logic to respond to user input:
    - Answer NL questions
    - Activate GUI elements

Example

Imagine we are building a web app for viewing customer questions and answers. To make it easy to view question-answer pairs for a given time range, the app will be able to filter on a first day and a final day.

Here’s how we might apply this NL interface building method to our app:

1: Build a typical graphical user interface (GUI)

Make sure the design of the GUI matches users’ mental model and provides straightforward, intuitive methods for performing expected tasks.

For our hypothetical app, the filter menu might look like this:

A filter menu with a first date input and a final date input, a reload button to apply the specified date range, and a red X to close the menu.
A filter menu with a first day input and a final day input, a reload button to apply the specified date range, and an X to close the menu.

The elements in your GUI provide much-needed cues for users — influencing users’ mental model, guiding their language, shaping their behavior, and suggesting what tasks can be performed.

2: Collect metadata about the elements in your GUI

For each element in our GUI, we need to collect metadata like the following:

  • A meaningful name for the element
  • A short description of what the element is for and why you would use it
  • More detailed “articles” about using the element in tasks
  • The DOM identifier of the element
  • An array of functions that should be called when the element is activated

For our example app, that GUI elements metadata might look like this:

[
{
"element_name": "First date filter input",
"task_description": "Setting this to a date causes question-answer pairs before this date to be filtered out of the view",
"article_txt_arr": [
"You can filter the question-answer pairs shown by date.",
"For example, you can filter out any question-answer pairs before a given date by setting the first date filter input.",
"In other words, you can show only question-answer pairs starting on or after a given date by setting the first date filter input."
],
"dom_id": "first_date_input",
"actions_arr": [
{ "function": "openFiltersMenu",
"values_needed_arr": []
},
{ "function": "setFirstDateInput",
"values_needed_arr": [ "date_str" ]
}
]
},
{
"element_name": "Final date filter input",
"task_description": "Setting this to a date causes question-answer pairs after this date to be filtered out of the view",
"article_txt_arr": [
"You can filter the question-answer pairs shown by date.",
"For example, you can filter out any question-answer pairs after a given date by setting the final date filter input.",
"In other words, you can show only question-answer pairs starting before or on a given date by setting the final date filter input."
],
"dom_id": "final_date_input",
"actions_arr": [
{ "function": "openFiltersMenu",
"values_needed_arr": []
},
{ "function": "setFinalDateInput",
"values_needed_arr": [ "date_str" ]
}
]
},
{
"element_name": "Page reload button",
"task_description": "Clicking this button causes the page to reload",
"article_txt_arr": [
"You can reload the page with the selected filters by clicking the reload page button."
],
"dom_id": "reload_btn",
"actions_arr": [
{ "function": "reloadPage",
"values_needed_arr": []
}
]
},
{
"element_name": "Close filters menu button",
"task_description": "Clicking this closes the filters menu",
"article_txt_arr": [
"You can close the filter menu by clicking the close filter X."
],
"dom_id": "filters_close_x",
"actions_arr": [
{ "function": "closeFiltersMenu",
"values_needed_arr": []
}
]
}
]

3: Create prompts to analyze user input

3.1: Classify user input

When a user submits NL input, our NL interface needs to distinguish between different user needs (or intents.) For example, a user might be asking a how-to question to get an answer back, or they might be giving an instruction to cause an action.

For our example app, the following prompt can be used to cause an LLM to classify user input. At run time, the user input would be substituted for the user input placeholder:

Classify the following user input as one of: "question", "action_request", 
or "none"

question: The user input is asking a question.

action_request: The user input is an instruction to take an action.

none: The user input is not a question and not an instruction.

User input: __PLACEHOLDER_FOR_USER_INPUT__
Class:

3.2: Detect which GUI element is related to user input

When a user asks a question or makes an action request, we must identify which GUI element the input is related to.

If the design of your GUI aligns well to users’ tasks, then the input users are likely to submit will be related to at least one GUI element.

For our example app, the following prompt can be used to identify which GUI element is most closely related to given user input. At run time, the element details placeholder can be pulled from our GUI elements metadata:

A web interface has the following elements:
__PLACEHOLDER_FOR_ELEMENT_DETAILS__

Which element is the following user message about?
__PLACEHOLDER_FOR_USER_INPUT__

Element:

3.3: Identify values in user input

If the user is giving an instruction that results in a value being assigned to a GUI element, we need to extract the value from the user input. And we need that value to be expressed in a way that can be used programmatically. For example, the number “five” will need to be the digit “5”.

For our example app, the following prompt can be used to extract any values specified in user input. This prompt is made up of the previous prompt, the output that prompt generated, and an extra piece causing the LLM to identify a value:

A web interface has the following elements:
__PLACEHOLDER_FOR_ELEMENT_DETAILS__

Which element is the following user message about?
__PLACEHOLDER_FOR_USER_INPUT__

Element: __PLACEHOLDER_FOR_ELEMENT_ID__

What value should the element be set to?

Value:

4: Add application logic to respond to user input

4.1: Answer NL questions

If user input is classified as a question (prompt 3.1 above), our app must determine which element the question is about (prompt 3.2). Then, the app needs to prompt an LLM to answer the question, grounded in information about the related element.

For our app, the following prompt can be used to answer users’ questions grounded in information pulled in from the GUI elements metadata:

Article:
-------
## Element name
__ELEMENT_PLACEHOLDER__

## Task element performs
__TASK_PLACEHOLDER__

## About this element
__ARTICLE_PLACEHOLDER__
-------

Answer the following question using only information from the article.

Question: __USER_INPUT_PLACEHOLDER__

Answer:

4.2: Activate GUI elements

When user input is classified as an action request (prompt 3.1), the app needs to perform these steps:

  1. Identify which element the action request is related to (prompt 3.2)
  2. Extract any value specified in the user input (prompt 3.3)
  3. Execute the functions listed in the actions_arr specified in the metadata for the related element

Imagine our app receives the following user input:

Filter out everything after the end of January

This input would be handled by our app like this:

  • The related element is final_date_input
  • The value is 2024-01-31
  • The functions to be executed are: openFiltersMenu and setFinalDateInput (with the date string value passed in)

To reload the page with the filters applied, the user would need to click the Reload button (or give NL input like “reload the page”.)

When users see fields in your GUI being populated or activated, it helps to explain what’s happening and how the system is responding to user input.

Scaling out this approach

The example above is very simple. Here are considerations for applying this method to building NL interfaces for real apps.

More complex user input

  • Different types of input — There will be other types of user input, beyond questions and action requests. And the types of user input will vary, depending on the nature of the app.
  • Complex, multi-hop input — The example user input above related to only one element at a time. In reality, your users will give instructions related to multiple elements and that require multiple actions in response. One strategy for handling these cases is to preprocess user input to tease the input apart into its constituent pieces.
  • Multiple input values — The example user input above included only one value (a date string) that needed to be passed to a function. In reality, some functions will need multiple parameters. When the parameters are of different data types, one strategy for handling these cases is to separate the extracted values by data type.

More complex apps

  • Many GUI elements — Most apps will have so many elements that you cannot get good results pulling all the element metadata into prompts at once. (The LLMs will get lost in the middle of all that information.) So your app will need to search for a subset of likely elements and include metadata for only those elements in prompts.
  • Generating element metadata — After your GUI is built, the bulk of your element metadata can be automatically extracted from application code.
  • Leveraging documentation — Instead of having hard-coded articles in article_txt_arr, it’s more likely your apps would pull in articles from elsewhere, such as documentation, blog posts, etc.

UX design

  • Separate filling in fields from submitting actions — In the example above, when the user asked to filter out all data after the end of January, the final_date_input field was filled in, but the Reload button was not fired. It would have been simple to add the reloadPage function to actions_arr of the final_date_input element. But sometimes, taking two steps like this makes it easier for the user to see how the system is responding.
  • Non-reversible actions — Consider whether some actions just shouldn’t be allowed through the NL interface. Just as GUIs use confirmation pop-ups today, it would be wise to implement some mechanism to prevent a misunderstood instruction from permanently deleting files, transferring all the money out of an account, replying to all with an email, or anything else potentially dangerous and non-reversible.
  • Out-of-context input — Users will ask questions about things beyond the scope of your app and instruct the app to perform tasks that aren’t part of its programming. If you try to crowbar all input through logic intended for in-scope input, users will be unhappy with the results and keep rephrasing their input to try to get the app to do what they want. Instead, it’s best to build a component that can identify when input is out of scope and then relay that information to the user.
  • LLM error handling — The suggested method above is structured to reduce the potential for and impact of LLM hallucination. For example, when the sample app above is classifying user input, if the LLM generates anything other than question or action_request, then the application logic could instruct the user to rephrase the input. Similarly, when identifying the related element, the app would only move forward if the generated output is a valid DOM identifier listed in the GUI elements metadata. In both cases, even if the LLM hallucinated bizarre or hateful output, the application logic would prevent user harm. For question answering, there are multiple strategies for identifying harmful output in order to not return that output. In production apps, build safeguards for each step where an LLM generates output.

Conclusion

In time, common NL interface patterns will emerge that users will recognize and even come to expect. But for now, when we are at the beginning of the transition to wide-spread adoption of NL interfaces, combining GUI elements with NL input is the best of both worlds: the structure of the GUI design helps us build the NL interface, and the elements of the GUI give users cues about what tasks they can perform and how to talk about them. Like having a safety net, having a GUI makes it possible to be more daring and innovative with your NL interface.

The Flying Codonas of Mexico, The Flying Concellos of the United States, and Les Amadori of Italy, 1932 (source) These athletes were the best in the world. They could push their performance to the limits of their ability, because a safety net made risk-taking possible.

--

--

Sarah Packowski
Sarah Packowski

Written by Sarah Packowski

Design, build AI solutions by day. Experiment with input devices, drones, IoT, smart farming by night.

No responses yet