Hosted Transcription Service

Global computer network and transcripts HTS

A prototype Speech Recognition system that automatically transcribes audio or video files. 

  • State of the art Speech Recognition engine
  • Implemented virtually via “Cloud”  - no software installation required
  • Speaker Independent - no voice profile training
  • Creates text and interactive Multimedia Transcript

How it works

To support the Liberated Learning Consortium’s objective to improve recognition accuracy, IBM Research engineered a next generation Speech Recognition engine that could be deployed as a Hosted Transcription Service (HTS). This system, implemented virtually via a cloud computing environment, performs speaker-independent, offline transcription of audio and video files.  This system converts raw data (such as mp3, MPG, WMV, AVI, etc) into a suitable format for processing. For improved recognition accuracy, HTS utilizes a double pass decoding technique and dynamically adjusts to the speaker's voice, without requiring voice profile training or enrollment.

To use the system, authenticated users visit an online portal, log into their secure accounts, and then upload a media file for automatic transcription.

Multimedia Transcripts

Once HTS has converted and processed the recorded lecture, students participating in the project will receive Speech Recognition generated Multimedia Transcripts.

Traditional transcripts are typically generated by converting a spoken language source (audio, video) into text.  This process is typically achieved by listening and manually typing what is heard.  In this project, Multimedia Transcripts refer to Speech Recognition generated text that is synchronized with a spoken language source and possibly other media (slides, images, etc).

Students will use a Flash based interactive interface that allows each user to customize how the information is viewed. 

view HTS sample audio with SR generated text

view HTS sample video and SR generated text

[note - depending on your browser, you may receive a certificate warning given the sample files are stored behind a secure firewall - it is to safe to proceed to the sample]

Based on the format of the recorded lecture and on the learner's preferences, content can be viewed in a number of predefined layouts or adjusted dynamically. Although the content is automatically synchronized by HTS, disaggregated content is also available, which allows students to self select the individual learning objects and combinations (text only, text and audio, etc) that suit individual preferences.

HTS for Instructors / Institutions

In addition to creating accessible multimedia for students, HTS provides post production tools that allow authors to add supporting content and format the newly created presentation. A "Presentation Package" that contains a set of converted flash video and associated files (including transcript and timing data) can be downloaded by the author and repurposed for various teaching/learning applications. The package files can be distributed via a third party server and viewed through other web services.  

view HTS sample Presentation - audio, text, and slides

HTS includes development tools, and examples that have helped Consortium members create customized SR based transcription and captioning systems using Application Programming Interfaces. Consortium partners have integrated HTS components into existing lecture capture systems and e-learning processes to create more dynamic, accessible learning content.

Please review the Hosted Transcription Service (HTS)Terms of Usage

To learn about institutional applications, email liberated.learning@smu.ca