Why and How to Perform an Automatic Speech Recognition (ASR) Evaluation

Why use an ASR evaluation?

An ASR Evaluation can help developers troubleshoot speech recognition issues and improve performance. In addition, an ASR Evaluation can help you identify commonly misrecognized words resulting in better customer experience.

Why we created this evaluation

Evaluating the results from various speech recognition vendors is a challenging task. To help make this process a little easier, we put together a utility, open-sourced on GitHub (link here) that will enable you to evaluate speech recognition vendors and results faster and goes beyond just Word Error Rate (WER) which you might see elsewhere. The utility automatically performs the pre-processing or normalization of the text to remove further manual efforts for the evaluation process.

Metrics from an ASR evaluation

This utility can perform an evaluation of the results generated by any Speech to Text (STT) or Automatic Speech Recognition (ASR) System.

You will be able to calculate these metrics:

Word Error Rate (WER), which is the most common metric for measuring the performance of a Speech Recognition or Machine translation system
Levenshtein Distance calculated to the word level
Number of Word-level insertions, deletions, and mismatches between the original and generated file
Number of Phrase level insertions, deletions, and mismatches between the original and generated file
Text Comparison to visualize the differences (color highlights)
Overall statistics for the original and generated files (bytes, characters, words, newlines, etc.)

Installation:

$ npm install -g speech-recognition-evaluation

What to expect

The simplest way to run your first evaluation is to pass your original and generated options to asr-eval command. The original file is the human-generated plain text file with the original transcript for reference. The generated file is also plain text but contains the generated transcript from the STT/ASR system.

$ asr-eval --original ./original-file.txt --generated ./generated-file

To perform your evaluation, visit the Speech Recognition Evaluation Library on GitHub.

Next steps

ASR evaluations can be confusing and time-consuming. We hope this utility makes the process easier and more convenient for you. We hope this was useful as you explore the benefits of conversational intelligence in your own products. If you have not already taken advantage, we have free trial credits so you can try Symbl’s Platform today.

Learn more about our conversational intelligence solutions by visiting our developer documentation.

Toshish Jawale

Co-founder, CTO

Working on making machines understand humans. Building LLMs, Speech, and Conversation Understanding Models, and making them available in easiest way for developers.