background gradient of shades of blue

CLINICAL STUDY

Evaluating patient-facing AI agents

A study across 6,000 real patient calls, measuring clinical accuracy, empathy, and what AI agents can actually handle in healthcare today.

AI agents evaluated at clinical scale

As AI agents move into patient conversations, “does it work?” isn’t enough. Healthcare demands proof across clinical accuracy, patient experience, and safety, simultaneously.

In this study, Infinitus analyzed approximately 6,000 patient-facing calls spanning three interaction types – health risk assessment, cholesterol education, and PHQ-2 depression screening. The researchers evaluated Infinitus voice AI agents against the metrics that actually matter in clinical settings.

Here’s what you’ll find inside

If you’re responsible for deploying AI in patient-facing workflows, this is a transparent look at how it can be evaluated rigorously, as well as evidence that the bar can be met.

Patient experience data at scale

How do patients actually feel talking to an AI about their health? The results across professionalism and empathy, broken down by interaction type, may surprise you, including why one agent type sets a measurably higher bar.

Clinical accuracy results that hold up

For a screening tool designed to catch patients who need clinical follow-up, one metric matters more than any other. See how the PHQ-2 agent performed on the error type that can’t be tolerated.

A deep dive on our evaluation framework

Patient surveys, clinical accuracy analysis, human expert review, and LLM-as-judge scoring – because no single method captures the full picture. This study shows how Infinitus layers them.

Gradient background image of cool blues and greens

Get the data on patient-facing AI at clinical scale

See how Infinitus measured clinical accuracy, empathy, and safety across 6,000 patient calls