Technology at Infinitus

Shyam Rajagopalan
Shyam Rajagopalan
CTO

At Infinitus, we are creating an AI solution to help our customers automate their outbound phone calls. When Ankit and I first chatted about solving this problem, it seemed technically daunting: we had to build a platform that can hold a conversation lasting over 30 minutes with minimal errors, if any. While most conversational AI platforms are able to handle conversation state across 8-10 turns (one utterance by each of the participants in a conversation), our system has been built to handle state across over 100 turns. I’m happy to share that just a year and a half later, our system has successfully completed over ten thousand such phone calls. In this post, I outline some of the breakthroughs we have had to make and preview some of the exciting technical challenges that lay ahead.

The Infinitus Tech Stack

Every phone call starts with our phone VOIP layer. Unlike traditional short lived REST requests, the data for the phone calls happen via longer lived connections like Websockets and WebRTC, which have unique networking requirements. We have designed a backend architecture that allows us to cooperatively work on all the different dataflows that an Infinitus phone call needs - transcribing and recording audio, Natural Language Processing (NLP), Speech Synthesis, etc, and all within the context of a single long lived connection. In the future, we will be exploring alternate VOIP architectures as well as building better traceability to harden our system against dropped connections and packets.

After our VOIP layer, the first step of our AI pipeline is Speech to Text (STT). Given that it’s the input layer for a multi stage NLP pipeline, errors from STT can really cascade through the rest of our system. We have picked a third party STT system that provides a customized model for the type of audio we receive. We have incorporated a domain-specific vocabulary into our STT model and built a custom evaluation model that ranks the multiple generated alternative transcriptions given by the STT system. To further improve our STT layer, we plan to explore custom acoustic and language models built on our audio libraries.

Our NLP system processes the output of STT and given the context of the call thus far, decides how to respond (if at all).. Our conversational engine takes into account the most recent utterance from the other party, as well as the dialog state (inputs to call, outputs collected, and last N utterances). Our engine is a combination of multiple models that are designed for different stages of most B2B healthcare conversations. This gives our customers the flexibility to tailor the engine to their needs, while allowing us to support a wide variety of customers with a robust, scalable platform.  As a result, our NLP system has many engineering challenges like tracking long running conversation state and context. We also have traditional scientific challenges like training and evaluating different types of models for each type of context, industry and customer. As we make many more calls through our system, we are excited at the opportunity to evaluate how all of the recent advances in ML infrastructure and NLP, such as transformers, can be applied to our domains.

One of the areas we have had to invest in deeply is our data labeling platform. We have built a sophisticated set of frontend UX tools to aid us in this. Collecting structured data with appropriate labels is vital for building ML models, and when we looked at the options we had, we realized that a lot of data labeling services work great for images, and with a bit of custom work, work just OK for labeling a single utterance. However, most tools really struggle with being able to label full multi-turn conversations. In addition to being able to label data for model training, our data labeling tools have been extended to allow us the ability to debug the calls our AI system has completed in the past. This is something that has allowed us to iterate quickly to improve all of our systems.

To tie all of this together for our customers, we have designed and deployed an easy-to-use set of APIs along with customer portals to both create tasks for the Infinitus AI to perform, as well as to process the results of completed tasks. We have taken a security first mindset, and implemented a role based access control system for viewing data using our customer portal, isolated each customer’s data, and also engaged with security vendors to analyze and fix potential vulnerabilities. Another aspect of delivering value to our customers is teaching our AI systems the business rules and business intelligence our customers have learned over the years about the data these phone calls collect. While our AI can and will eventually learn these rules on its own, by directly teaching our AI, we can incorporate this intelligence from day zero. To achieve this, we will be collaborating with our customers to design rule systems and domain specific languages to validate all of the data that the Infinitus AI collects.

While it is humbling to see how much we have built with a ‘one pizza engineering team’ in the last 16 months, I am uncomfortably excited about all that we have left to do. If you would like to join our amazing team on this exciting journey, we have a number of open engineering roles  across frontend, backend infrastructure and services, and NLP.


Related Blog Posts

July 25, 2020
|
By
Ankit Jain

Introducing Infinitus

Today, my co-founder Shyam Rajagopalan and I are introducing the world to Infinitus, a voice robotic process automation (Voice RPA) company targeted to reduce the complexity and cost of healthcare in the United States. We have been working on this project since February 2019 and are excited to share our progress on it on this blog over the coming years.

July 25, 2020
|
By
Conal Sathi

Conal: NLU at Infinitus

This week, we’re excited to be featuring Conal Sathi. Conal has been building the computer brains that drive the conversations that the Infinitus digital assistant has. From rule based systems to self-learning models, Conal has been pulling out all stops to make sure our customers are getting the most out of their Infinitus experience.

July 25, 2020
|
By
Julian Frumar

Julian: Engineering at Infinitus

This week, we’re excited to be featuring Julian Frumar. Julian is a two time entrepreneur and thrives in early-stage companies that are figuring things out. He has also previously worked at Google and YouTube where he led the redesign of the YouTube player, watch page, and TV products.

July 25, 2020
|
By
Diana Jung

Diana: Engineering at Infinitus

This week, we’re excited to be featuring Diana Jung. Diana has been focused on automation tools at Infinitus. She has been rethinking how data labeling tools in the conversational AI world should be built.

Join Our Newsletter

Get in-depth, technical articles published by our healthcare, automation and operations experts.
Request Demo

Request Demo

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.