Faster benefits, fewer bottlenecks: Output extraction for healthcare at scale

Output extraction is the process of converting messy, unstructured inputs into structured data fields that downstream systems can use. At Infinitus, this capability powers digital healthcare infrastructure, enabling us to provide a turnaround time of as little as 30 minutes for customers.

Back-office healthcare tasks like benefit verification require pulling data from multiple sources – payor APIs, IVR, call transcripts, and even images and PDF documents. When possible, we strive to collect this data digitally, the fastest path to acquiring the information that’s needed. But when data is unable to be acquired digitally, the fallback is a phone call – which slows workflows and adds manual effort.

Infinitus applies output extraction across a wide range of sources: healthcare call transcripts, images, and third-party API responses – to reliably map unstructured data into standardized internal field values for each task. By making the most of these digital sources, Infinitus ensures smoother workflows with reduced turnaround times for our customers.

How does Infinitus scale output extraction ?

Traditional solutions haven’t scaled:

Manual prompt engineering improves accuracy but is costly to maintain and lacks systematic evaluation.
Fine-tuning creates powerful custom models but demands large datasets, heavy compute, and dedicated teams – rarely justified for most healthcare tasks.

Infinitus addresses these challenges with two workflows: optimization, and inference

Optimization

In the optimization workflow, shown on the right in the illustration below, relevant examples such as API responses, transcripts, and images are gathered from historical human-labelled data, then split into training and test sets. Next, the system uses DSPy optimizers to fine-tune the prompt for production-quality extraction. This process combines prompt engineering with few-shot learning to iteratively improve the accuracy of extracted outputs, measured against a specialized evaluation metric that is employed to ensure accuracy and minimize speculative responses, allowing for easy configuration of risk tolerance.

The resulting optimized prompt is then compiled and saved to a Cloud storage service. This workflow can be scheduled to ensure prompts stay up to date and accurate even as data shifts.

In healthcare, guaranteeing accuracy and eliminating hallucinations are critical. To achieve this conservative behavior and build confidence, we employ a specialized evaluation metric during the training process. DSPy incorporates a user-defined helper method for scoring results, allowing for easy configuration of risk tolerance for any given output. The below scoring guide provides insight into how we ensure the model abstains rather than guesses on critical fields while being more forgiving on supportive information where downstream normalization handles variations.

Scoring framework

Field type	Correct answer	Unknown/blank	Incorrect answer
Critical	1.0	0.5	0.0
Important	1.0	0.2	0.0
Supportive	1.0	0.0	0.0

Besides this evaluation metric, we also define signatures for the input and output of a model. Using structured outputs is another mechanism which guides the language model toward the determinism required by healthcare extraction tasks.

Now that we have defined the scoring metric and signatures, the optimizer can leverage our historical data – real API responses with human-validated outputs – and fine-tune a prompt with the most helpful examples and instructions. The result? A healthcare agent capable of interpreting benefits responses and identifying plan details, super charging back office healthcare tasks.

The newly christened program can be packed as a compressed file and stored in the cloud waiting for tasks. When an extraction request is emitted from a deployment, the signatures we defined it with allow us to return a structured JSON object, allowing the calling entity to parse the response in accordance to the expectations of the specific business need.

Inference

At several key points in the customer task lifecycle, external data is needed to allow subsequent internal processes to continue. Although the specific content may differ, the extraction method remains the same because of how our trained models are stored, regardless of which model is actually used.

The semi-structured external content and relevant model id are sent in an extraction request to the Output Extraction Service which kicks off the inference workflow. First, the service retrieves the use case specific, optimized prompt from the cloud to be used for this request, before accurately parsing the data and formatting the output. Finally, the Output extraction service responds back to the API Server with the extracted, structured data, which can then be reliably used by downstream processes.

The results highlighted below show the significant improvements in coverage and reliability achieved, with the optimized approach achieving 94%, compared to 71% for manual prompt engineering and 50% for legacy parsers.

New wpDataTable

Metric	Legacy parsers	Prompt engineering
Coverage	50%	71%
False positive rate	<1%	15%
Manual review required	50%	29%

How Infinitus guarantees safety in output extraction workflows

Infinitus has three guardrails in place throughout this process to ensure accuracy in every deployment:

Shadow mode: New models run quietly logging extractions without affecting production outputs, enabling performance validation on live traffic
Holdout: A configured number of tasks are held back and submitted to a human to manual review
Post-processing evaluation: Before tasks are submitted to customers they receive automated review by both rules based approaches, statistical models, and LLM as judge to identify anomalies and contradictory outputs

By utilizing optimized prompt models for output extraction, Infinitus is pushing healthcare into the AI revolution safely and accurately. The deployment of optimized programs as described above has been a cornerstone of Infinitus’ ability to significantly increase the number of fully digitized tasks in the past few months, representing over 18,000 more patients whose provider received verification of benefits nearly instantaneously.

This capability ensures that customers waiting for insurance information regarding potentially life saving therapies are able to receive timely and accurate information, enabling both patients and doctors to make the best-informed decisions as quickly as possible regarding their treatment options and care plans.

If you’d like to learn more about the work our engineering team is tackling at Infinitus or see our open roles, visit our Careers page.

Faster benefits, fewer bottlenecks: Output extraction for healthcare at scale

How does Infinitus scale output extraction ?

Optimization

Scoring framework

Inference

New wpDataTable

How Infinitus guarantees safety in output extraction workflows

Recommended articles

Scaling agentic AI: Intern-led advances in MCP, IVR, multilingual capabilities and more

How to evaluate agentic AI solutions for healthcare

Inside the Infinitus MCP journey: Bringing Model Context Protocol to healthcare AI

See why more healthcare orgs choose Infinitus over any other voice AI.