The Mod9 difference


Leading results with industry benchmarks.

Deep Neural Network (DNN) architecture.


Domain specific keywords, phrases and jargon “on the fly”.

Easily integrates with your systems, solutions and workflows.


Recorded audio processed in parallel.

Realtime audio processing that scales horizontally.

Cost Effective

Market-leading channel density (channels per CPU).

Flexible licensing options that work for your business.

What customers are saying

Mod9 Technologies helps us deliver an industry-leading ediscovery solution for investigations and litigations. We selected the Mod9 ASR engine for its accuracy and speed in the transcription and analysis of recorded audio content, as well as the product’s ability to be deployed within our own environment, which reduces the complexity, operational cost, and risk of managing our customers’ sensitive data offsite.

AJ Shankar

CTO and Founder, Everlaw

In delivering a world class AI coaching and training platform for contact center agents, VoiceOps needed an accurate, customizable, real-time, transcription engine, ideally compatible with the IBM Watson API. Mod9 was able to deliver against these requirements, offering highly-accurate transcription and analysis of spoken conversations in real-time at a reduced cost, delivering valuable, actionable and data-driven feedback for coaches, raising the performance of the contact center in weeks, versus months.

Nate Becker

Co-founder, VoiceOps

As a Global Leader in AI Training Data Services and Software, BasicAI is committed to providing high-quality data annotation services and software. BasicAI selected Mod9 as a partner because of the highly customizable nature of their ASR solution and deep expertise in the space.  Our customers can now utilize our labeled speech-to-text datasets to build custom solutions for languages, dialects, and industry-specific terminology to satisfy their use cases.

Tyler Schulze

CEO, BasicsAI

Capabilities and characteristics

A core capability of the Mod9 ASR Engine includes the ability to automatically transcribe conversations, either real-time or from recorded audio content. While the Engine can be configured with either large or small vocabularies, it can also be customized with the addition of keywords and phrases “on the fly”, ideal for domain specific applications.

Conversational intelligence

Real-time and batch transcript generation for live conversations or recorded audio.

Smart language models

Continuously training, improving and introducing new language and acoustic models.

Narrow and wideband audio

For 8kHz (telephony) and 16kHz (audio/video) applications and requirements.

Large and small vocabularies

For conversational (real-time or recorded) and directed dialogue applications.


Enables speech recognition and transcript generation in parallel and at massive scale.


Include domain specific vocabularies “on the fly”. Tune for application specific use cases.

Speech recognition and results (JSON format)

The Mod9 ASR Engine is asynchronous and will simultaneously return results while still receiving audio. While processing audio data, the Engine will respond with one or more JSON-formatted messages representing the ASR result. In addition to responding with the “1-best" hypothesis and depending on how configuration options have been set, additional metadata may be returned, and further processing may also take place.

Word confidence scores

To gauge certainty in determining if a returned word is as was stated.

Word level timestamps

To maintain a chronological order and help recreate events.

Partial results

For improved recognition speed and naturalness in the response.

Natural language processing (NLP)

Automatic transcript formatting. Adding punctuation, capitalization, disfluencies and more.

Speaker labels

So that specific speakers can be easily identified in the transcript.


For scenarios where multi-channel recordings are unavailable.

Architecture and interface

The Mod9 ASR Engine, a multi-threaded TCP server, is implemented using a client / server architecture and is deployed in your data center or private cloud. To help ensure you can get up and running quickly, a generic Python client application is provided (with a sophisticated command line interface) although other custom TCP clients can be developed.

TCP server

A multi-threaded TCP server.
Docker® container “packaged”.
Native Linux support (CentOS, Ubuntu).
Full duplex communication, custom protocol.
JSON format (commands, options and results).
Kaldi for Deep Neural Network (DNN) capabilities.
On-premise (in your data center) or Private Cloud.
Edge device support options.


Python application, command line interface (CLI).
Support for custom client development (TCP socket).