Hosted Transcription Service

Hosted Transcription Service is a prototype Speech Recognition system that automatically transcribes audio or video into accessible Multimedia Transcripts. While traditional transcripts are typically generated by listening and manually typing what is heard, Multimedia Transcripts refer to SR generated text that is synchronized with a spoken language source.

Learn more about HTS.

HTS Usage Statistics

Researchers tracked (HTS) usage statistics and technical performance, including analysis of recordings uploaded, associated metadata, and Word Error Rates for SR based transcription.

  • Throughout the project, 499 recordings containing 550 hours of audio were digitized and submitted for transcription.
  • The average recording was 62 minutes (standard deviation 37 minutes, indicating a wide range of recording lengths).
  • File uploads peaked in mid-November, early February, and mid-March. All three time periods preceded important university dates.
  • Recordings were made from a wide variety of academic disciplines, implying that students perceived potential value regardless of the course.
  • By project end, students were uploading over 20 hours of audio for transcription per user account.

Word Error Rates

Improving recognition accuracy, or conversely decreasing Word Error Rates (WER), are longstanding goals for the scientific community. In any given transcription, there are inevitably words that are transcribed incorrectly. Analyzing WER and sources of possible errors is a key component of the quantitative evaluation. WER is defined as the number of incorrect (misrecognized) words divided by the total number of words spoken (* 100).

  • The average WER for all files transcribed was 24.67%.
  • There was considerable variation in results, with a standard deviation of over 17%.
  • Variables that affected WER included recording quality, speaker accents, gender, and academic discipline.
  • The average WER for North American native speakers was 21.37%.
  • One third of all transcriptions had 15% or lower WER (85% Accuracy).

To learn more about research to improve WER, please visit Lecture / Language model development