New research: Improving LLMs for healthcare by leveraging domain guidelines

Recently there have been huge advancements in the ability of large language models (LLMs) to handle conversational systems.

In many practical industry settings, the conversational AI systems we want to build are task-oriented dialog systems (TODs). In these systems, the goal is to help a user complete a task or collect certain information. For many of these applications, it is critical to follow specific procedures or guidelines. For example, in a banking application one must verify identity before providing an account balance. In the healthcare setting, we must complete HIPAA verification before requesting benefit information. Or if we know someone has a PPO plan type versus an HMO plan type, there are different sets of information to be collected to make sure patients get the care they need properly covered.

While LLMs can model these dialogue guidelines, existing pretrained models cannot do so out of the box without additional information, because they have not seen enough data where these domain-specific procedures are followed. We can fine tune LLMs for these settings, but they require large amounts of data to implicitly learn these procedures. In some settings, the amount of clean training conversations may be limited. Especially if the expected agent procedures continue to evolve over time.

Focusing on ‘explicitly’ stored guidelines

In practical settings, there are often agent guidelines available, such as company policies, customer service manuals, or documents on standard operating procedures. In our work, we explore leveraging these explicitly stored agent guidelines to improve LLMs for action prediction in low-data settings.

Our proposed system KADS combines LLMs with a Knowledge Retrieval module:

The knowledge retriever takes as input the agent-customer dialogue, and retrieves the most relevant guidelines from a knowledge base of agent procedures
The language model is conditioned on both the retrieved document and the ongoing user-agent dialog to predict the action sequence.

Our approach to training the system

To train our system, we:

Start with a dialog-document matching task to warm up the embedder modules
We then pre-train the full model with an action-oriented mask language model task, where instead of masking arbitrary tokens, we focus on masking actions in a dialog action sequence
Finally, we train on one of two downstream tasks, Action State Tracking or Workflow Discovery
1. Action state tracking (AST) is predicting the next action at a given point in the conversation. This is useful in a live conversation to predict which action to take.
2. Workflow Discovery (WD) is taking as input a full dialog, predicting the full sequence of actions or the expected workflow. This is useful as an offline system to determine an ideal workflow given both guideline docs and real conversation data. This workflow can then be used to build out one of a number of approaches for AST, which can be used in production in live calls.

We’ve experimented with these approaches on our internal datasets for production settings, but from a publication standpoint we also applied this model to public datasets where we can further discuss and directly compare results to other state of the art approaches.

We evaluated our system on the ABCD and SGD datasets, and found that our proposed model KADS outperforms existing systems for AST on all evaluated datasets. For WD, KADS outperforms T5 models on ABCD, but not for SGD. This may be due to the nature of SGD dialogues, which contain multiple client requests, and our model is currently augmented with a singular document. As a next step, we could investigate leveraging multi-document retrieval.

Promising results for the future

Overall, our results offer a promising outlook for action prediction given dynamic guidance from structured procedural documents. Another benefit of dynamically pulling from explicit guidelines is that the system can be updated quickly if guidelines are added or evolve over time, which is common in industry. These are exciting results in incorporating guidelines with LLMs for practical settings.

To learn more about the work we’re doing, we invite you to read the full research paper, which is available here.

New research: Improving LLMs for healthcare by leveraging domain guidelines

Focusing on ‘explicitly’ stored guidelines

Our approach to training the system

Promising results for the future

Recommended articles

Meet the Infinitus human in the loop: Jamal

Why is AI for healthcare so complicated?

Future Health Index 2024: The time for automation is now

See why more healthcare orgs choose Infinitus over any other conversational AI.