Natural language interfaces powered by large language models
RAG is good for so much more than question answering!
Natural language interfaces
Natural language (NL) interfaces enable users to interact with systems using natural language (as opposed to using a rigid command syntax or using buttons and fields of a graphical user interface.) The following video dives into different aspects of NL interfaces:
Why building NL interfaces has been so difficult
NL interfaces are not new. Researchers have been studying NL interfaces and how people interact with these systems for more than 50 years! Here’s a fun list of NL interfaces over the years: Natural language user interface: History (Wikipedia)
If researchers have been working on NL interfaces for so long, why aren’t there NL interfaces everywhere?
Using traditional natural language processing (NLP) techniques to translate messy, ungrammatical NL input into precise commands that a machine can execute is incredibly difficult. Traditionally, the task has been approached by identifying part of speech, and extracting keywords and domain-specific entities by using hand-curated dictionaries. Then, meaning is teased from that information with grammar routines. Special functions handle challenges like anaphora and ellipsis. And finally, the extracted meaning is mapped to commands. More recently, components to manage conversation flow have been added to the mix as well. The following image shows some diagrams depicting the workflow in related research papers from 1970's — 2020's:
Wow! That’s a lot of work. And it was only marginally successful for simple input with restricted domains.
Large language models to the rescue
Pre-trained large language models (LLMs) are the perfect tool for tackling the translate-unpredictable-input-to-output-in-a-desired-format challenge of building NL interfaces:
Adaptive classification — Unlike a traditional text classification model, LLMs can perform ad-hoc classification with little to no training data. The following prompt causes an LLM to classify user input based on only class descriptions (generated output in bold):
[ prompt ]
Classify the following user input as one of: "question", "action_request",
or "none"
question: The user input is asking a question.
action_request: The user input is an instruction to take an action.
none: The user input is not a question and not an instruction.
User input: "filter out everything after the end of January, 2024"
Class:
[ generated output ]
action_request
Logic — LLMs can perform pseudo logic (following patterns, really.) The following prompt causes an LLM to identify which part of a web app interface given user input is about (generated output in bold):
[ prompt ]
A web interface has the following elements:
Element: filter_icon_div
Description: Open filters menu button. Clicking this opens the filters menu
Element: filters_close_x
Description: Close filters menu button. Clicking this closes the filters menu
Element: first_date_input
Description: First date filter input. Setting this to a date causes
question-answer pairs before this date to be filtered out of the view
Element: final_date_input
Description: Final date filter input. Setting this to a date causes
question-answer pairs after this date to be filtered out of the view
...
Which element is the following user message about?
"filter out everything after the end of January, 2024"
Element:
[ generated output ]
final_date_input
Robust entity extraction — LLMs can extract entities without the need for exhaustive dictionaries. The following prompt causes an LLM to extract a date value from given user input (generated output in bold):
[ prompt ]
A web interface has the following elements:
...
Element: final_date_input
Description: Final date filter input. Setting this to a date causes
question-answer pairs after this date to be filtered out of the view
...
Which element is the following user message about?
"filter out everything after the end of January, 2024"
Element: final_date_input
What value should the element be set to?
A note about dates:
- Date values must always be specified like YYYY-MM-DD. For example,
'January 17th, 2022' would be: 2022-01-17. And '2020 Feb. 13th'
should be: 2020-02-13.
- If no year is given, assume the current year. The current year is
2024. So, January 17th would be: 2024-01-17 and Feb. 20th would
be: 2024-02-20
Value:
[ generated output ]
2024-01-31
Generating output in a desired format — The previous example also demonstrates the ability of LLMs to generate output in a format you specify in your prompt.
Translation — The examples above don’t deal with query syntax like SQL. But to an LLM, translating an NL query to SQL is not very different from translation from English to French.
Because of these advantages of LLMs, NL interface features are making their way into many applications.
But a problem is emerging
Teams who are keen to build NL interfaces are replacing GUI features with chatbots. Alas, when faced with the flashing cursor of a chatbot input, today’s users have much in common with users of UNIX command-line interfaces of old, asking: “What should I type here?”
Desperate teams are adding suggestions in their chatbot interface, like: “ask me about X” or “try searching for Y”. But there’s a better way to set NL interfaces up for success…
A simple method for building intuitive NL interfaces
Here’s a method for building NL interfaces that is simple to build and results in NL interfaces that are intuitive for users:
- Build a typical graphical user interface (GUI)
- Collect metadata about the elements in your GUI
- Create prompts to analyze user input:
- Classify user input
- Detect which GUI element is related to user input
- Identify values in user input - Add application logic to respond to user input:
- Answer NL questions
- Activate GUI elements
Example
Imagine we are building a web app for viewing customer questions and answers. To make it easy to view question-answer pairs for a given time range, the app will be able to filter on a first day and a final day.
Here’s how we might apply this NL interface building method to our app:
1: Build a typical graphical user interface (GUI)
Make sure the design of the GUI matches users’ mental model and provides straightforward, intuitive methods for performing expected tasks.
For our hypothetical app, the filter menu might look like this:
The elements in your GUI provide much-needed cues for users — influencing users’ mental model, guiding their language, shaping their behavior, and suggesting what tasks can be performed.
2: Collect metadata about the elements in your GUI
For each element in our GUI, we need to collect metadata like the following:
- A meaningful name for the element
- A short description of what the element is for and why you would use it
- More detailed “articles” about using the element in tasks
- The DOM identifier of the element
- An array of functions that should be called when the element is activated
For our example app, that GUI elements metadata might look like this:
[
{
"element_name": "First date filter input",
"task_description": "Setting this to a date causes question-answer pairs before this date to be filtered out of the view",
"article_txt_arr": [
"You can filter the question-answer pairs shown by date.",
"For example, you can filter out any question-answer pairs before a given date by setting the first date filter input.",
"In other words, you can show only question-answer pairs starting on or after a given date by setting the first date filter input."
],
"dom_id": "first_date_input",
"actions_arr": [
{ "function": "openFiltersMenu",
"values_needed_arr": []
},
{ "function": "setFirstDateInput",
"values_needed_arr": [ "date_str" ]
}
]
},
{
"element_name": "Final date filter input",
"task_description": "Setting this to a date causes question-answer pairs after this date to be filtered out of the view",
"article_txt_arr": [
"You can filter the question-answer pairs shown by date.",
"For example, you can filter out any question-answer pairs after a given date by setting the final date filter input.",
"In other words, you can show only question-answer pairs starting before or on a given date by setting the final date filter input."
],
"dom_id": "final_date_input",
"actions_arr": [
{ "function": "openFiltersMenu",
"values_needed_arr": []
},
{ "function": "setFinalDateInput",
"values_needed_arr": [ "date_str" ]
}
]
},
{
"element_name": "Page reload button",
"task_description": "Clicking this button causes the page to reload",
"article_txt_arr": [
"You can reload the page with the selected filters by clicking the reload page button."
],
"dom_id": "reload_btn",
"actions_arr": [
{ "function": "reloadPage",
"values_needed_arr": []
}
]
},
{
"element_name": "Close filters menu button",
"task_description": "Clicking this closes the filters menu",
"article_txt_arr": [
"You can close the filter menu by clicking the close filter X."
],
"dom_id": "filters_close_x",
"actions_arr": [
{ "function": "closeFiltersMenu",
"values_needed_arr": []
}
]
}
]
3: Create prompts to analyze user input
3.1: Classify user input
When a user submits NL input, our NL interface needs to distinguish between different user needs (or intents.) For example, a user might be asking a how-to question to get an answer back, or they might be giving an instruction to cause an action.
For our example app, the following prompt can be used to cause an LLM to classify user input. At run time, the user input would be substituted for the user input placeholder:
Classify the following user input as one of: "question", "action_request",
or "none"
question: The user input is asking a question.
action_request: The user input is an instruction to take an action.
none: The user input is not a question and not an instruction.
User input: __PLACEHOLDER_FOR_USER_INPUT__
Class:
3.2: Detect which GUI element is related to user input
When a user asks a question or makes an action request, we must identify which GUI element the input is related to.
If the design of your GUI aligns well to users’ tasks, then the input users are likely to submit will be related to at least one GUI element.
For our example app, the following prompt can be used to identify which GUI element is most closely related to given user input. At run time, the element details placeholder can be pulled from our GUI elements metadata:
A web interface has the following elements:
__PLACEHOLDER_FOR_ELEMENT_DETAILS__
Which element is the following user message about?
__PLACEHOLDER_FOR_USER_INPUT__
Element:
3.3: Identify values in user input
If the user is giving an instruction that results in a value being assigned to a GUI element, we need to extract the value from the user input. And we need that value to be expressed in a way that can be used programmatically. For example, the number “five” will need to be the digit “5”.
For our example app, the following prompt can be used to extract any values specified in user input. This prompt is made up of the previous prompt, the output that prompt generated, and an extra piece causing the LLM to identify a value:
A web interface has the following elements:
__PLACEHOLDER_FOR_ELEMENT_DETAILS__
Which element is the following user message about?
__PLACEHOLDER_FOR_USER_INPUT__
Element: __PLACEHOLDER_FOR_ELEMENT_ID__
What value should the element be set to?
Value:
4: Add application logic to respond to user input
4.1: Answer NL questions
If user input is classified as a question (prompt 3.1 above), our app must determine which element the question is about (prompt 3.2). Then, the app needs to prompt an LLM to answer the question, grounded in information about the related element.
For our app, the following prompt can be used to answer users’ questions grounded in information pulled in from the GUI elements metadata:
Article:
-------
## Element name
__ELEMENT_PLACEHOLDER__
## Task element performs
__TASK_PLACEHOLDER__
## About this element
__ARTICLE_PLACEHOLDER__
-------
Answer the following question using only information from the article.
Question: __USER_INPUT_PLACEHOLDER__
Answer:
4.2: Activate GUI elements
When user input is classified as an action request (prompt 3.1), the app needs to perform these steps:
- Identify which element the action request is related to (prompt 3.2)
- Extract any value specified in the user input (prompt 3.3)
- Execute the functions listed in the
actions_arr
specified in the metadata for the related element
Imagine our app receives the following user input:
Filter out everything after the end of January
This input would be handled by our app like this:
- The related element is
final_date_input
- The value is
2024-01-31
- The functions to be executed are:
openFiltersMenu
andsetFinalDateInput
(with the date string value passed in)
To reload the page with the filters applied, the user would need to click the Reload button (or give NL input like “reload the page”.)
When users see fields in your GUI being populated or activated, it helps to explain what’s happening and how the system is responding to user input.
Scaling out this approach
The example above is very simple. Here are considerations for applying this method to building NL interfaces for real apps.
More complex user input
- Different types of input — There will be other types of user input, beyond questions and action requests. And the types of user input will vary, depending on the nature of the app.
- Complex, multi-hop input — The example user input above related to only one element at a time. In reality, your users will give instructions related to multiple elements and that require multiple actions in response. One strategy for handling these cases is to preprocess user input to tease the input apart into its constituent pieces.
- Multiple input values — The example user input above included only one value (a date string) that needed to be passed to a function. In reality, some functions will need multiple parameters. When the parameters are of different data types, one strategy for handling these cases is to separate the extracted values by data type.
More complex apps
- Many GUI elements — Most apps will have so many elements that you cannot get good results pulling all the element metadata into prompts at once. (The LLMs will get lost in the middle of all that information.) So your app will need to search for a subset of likely elements and include metadata for only those elements in prompts.
- Generating element metadata — After your GUI is built, the bulk of your element metadata can be automatically extracted from application code.
- Leveraging documentation — Instead of having hard-coded articles in
article_txt_arr
, it’s more likely your apps would pull in articles from elsewhere, such as documentation, blog posts, etc.
UX design
- Separate filling in fields from submitting actions — In the example above, when the user asked to filter out all data after the end of January, the
final_date_input
field was filled in, but the Reload button was not fired. It would have been simple to add thereloadPage
function toactions_arr
of thefinal_date_input
element. But sometimes, taking two steps like this makes it easier for the user to see how the system is responding. - Non-reversible actions — Consider whether some actions just shouldn’t be allowed through the NL interface. Just as GUIs use confirmation pop-ups today, it would be wise to implement some mechanism to prevent a misunderstood instruction from permanently deleting files, transferring all the money out of an account, replying to all with an email, or anything else potentially dangerous and non-reversible.
- Out-of-context input — Users will ask questions about things beyond the scope of your app and instruct the app to perform tasks that aren’t part of its programming. If you try to crowbar all input through logic intended for in-scope input, users will be unhappy with the results and keep rephrasing their input to try to get the app to do what they want. Instead, it’s best to build a component that can identify when input is out of scope and then relay that information to the user.
- LLM error handling — The suggested method above is structured to reduce the potential for and impact of LLM hallucination. For example, when the sample app above is classifying user input, if the LLM generates anything other than
question
oraction_request
, then the application logic could instruct the user to rephrase the input. Similarly, when identifying the related element, the app would only move forward if the generated output is a valid DOM identifier listed in the GUI elements metadata. In both cases, even if the LLM hallucinated bizarre or hateful output, the application logic would prevent user harm. For question answering, there are multiple strategies for identifying harmful output in order to not return that output. In production apps, build safeguards for each step where an LLM generates output.
Conclusion
In time, common NL interface patterns will emerge that users will recognize and even come to expect. But for now, when we are at the beginning of the transition to wide-spread adoption of NL interfaces, combining GUI elements with NL input is the best of both worlds: the structure of the GUI design helps us build the NL interface, and the elements of the GUI give users cues about what tasks they can perform and how to talk about them. Like having a safety net, having a GUI makes it possible to be more daring and innovative with your NL interface.