


Of course, counting errors in terms of substitutions, deletions, and insertions isn’t always straightforward.

Insertions – When the system adds a word into the transcript that the speaker didn’t say, such as or inserted at the end of the example.In the example, the system deleted the first word well. Deletions – When the system misses a word entirely.Transcribing the fifth word as this instead of the is an example of a substitution error. Substitutions – When the system transcribes one word in place of another.In this example, the ASR service doesn’t appear to be accurate, but how many errors did it make? To quantify WER, there are three categories of errors: Hypothesis transcript (what the ASR service transcribed): they went to this tour kept shook or Reference transcript (what the speaker said): well they went to the store to get sugar The lower the WER, the more accurate the system. WER is the proportion of transcription errors that the ASR system makes relative to the number of words that were actually said. The most common metric for speech recognition accuracy is called word error rate (WER), which is recommended by the US National Institute of Standards and Technology for evaluating the performance of ASR systems. For example, how many word errors are in the transcripts? This question is especially important if you pay annotators to review the transcripts and manually correct the ASR errors, and you want to minimize how much of the transcript needs to be re-typed. Literal transcription accuracy is often critical. The evaluation basics Defining your use case and performance metricīefore starting an ASR performance evaluation, you first need to consider your transcription use case and decide how to measure a good or bad performance. In this post, we show how to measure the basic transcription accuracy of an ASR service in six easy steps, provide best practices, and discuss common mistakes to avoid. Basic transcription accuracy is often a key consideration during these service evaluations. This evaluation process often analyzes a service along multiple vectors such as feature coverage, customization options, security, performance and latency, and integration with other cloud services.ĭepending on your needs, you’ll want to check for features such as speaker labeling, content filtering, and automatic language identification. When deciding whether to use a service, you may want to evaluate its performance and compare it to another service. Over the past few years, many automatic speech recognition (ASR) services have entered the market, offering a variety of different features.
