Auto-Captioning

Introduction

Auto-captioning takes audio data and sends it to a real time transcription service, which uses artificial intelligence to transcribe it as text. The transcription is either sent back to the video conferencing application or displayed by a seperate application, such as a web browser.

Auto-captioning can be useful in situations where:

  • Ad hoc meetings do not allow time to book captioning by a third party
  • The contents of the meeting are confidential and participants don't want a 3rd party listening in
  • 100% accuracy is not critical, such as a catch-up with a work colleague

MS Teams and Auto-Captioning

Hard of hearing staff members have said that the auto-captioning feature in MS Teams has been a huge help in their meetings. They note that the having the other participants turn on captioning helps a lot, because seeing their own words misinterpreted makes people slow down and speak more clearly.

We tested MS Teams with a script of 1300 words and found that it was 97% accurate (words only, not punctuation). However, the missing 3% does make a big difference. Auto-captioning really struggles with non-standard words, such as names.

Auto-captioning has so far been enabled for specific staff, together with their colleagues and students. Infrastructure Services are currently working on enabling this for all staff and students.

A MS Teams meeting with auto-captioning

MS Teams captions are most effective if everybody in the meeting has them turned on. In that way, participants can monitor their own audio quality.

Costs

Auto-captions are free within MS Teams. It is currently available to all staff and students.

Pros

  • Cost. It is cheaper than real live-captioning.
  • Ease of use. Transcripts can be set up by users themselves.
  • Captions are integrated into the Teams interface.

Cons

  • Accuracy. Auto-transcripts are only about 85% - 90% accurate which can be a deal breaker, especially when discipline specific vocabulary is involved.
  • Engagement. Students with hearing impairments advise that as soon as they see a video is auto-captioned, they turn it off.

Further Information


Zoom Meetings and Auto-Captioning

Zoom does not currently have auto-captioning built-in.

Otter.ai

Otter.ai is offers an auto transcription service for Zoom. Although they advertise the service as live captioning, it is really just a live transcript.

The difference between captions and transcripts is that captions appear at the bottom of the video screen, below the speaker. Transcripts appear in a panel to the side of the video player. This makes it harder for users to glance from the video to the transcript.

Otter live transcript in web browser next to Zoom window

Costs

Otter is offering live transcription as part of their Otter for Teams product.

The cost for education institutions is $50 (USD) per person or about $70 AUD per year.

Pros

  • Cost. It is cheaper than real live-captioning.
  • Ease of use. Transcripts can be set up by users themselves.

Cons

  • Accuracy. Auto-transcripts are only about 85% - 90% accurate which can be a deal breaker, especially when discipline specific vocabulary is involved.
  • Students with hearing impairments advise that as soon as they see a video is auto-captioned, they turn it off.
  • Usability. Unlike captions, which are overlaid on the video content, transcripts appear off to the side, making it harder for users to follow both things at the same time.
  • Cost. Even though it is cheaper than real captioning, it is still pretty expensive considering that there are no humans involved.

Auto-Captioning FAQ

Q. If MS Teams has auto-captioning built in, why don't we just get everyone in the Uni to use that?

A. Each video conferencing platform has its strengths and weaknesses. Whilst MS Teams has good auto-captioning, it doesn't allow live remote captions to be integrated. In addition, whilst MS Teams may be preferred by some hearing impaired staff, Zoom has better support for screen reader users. Meeting organisers need to be mindful of the needs and preferences of meeting participants and select a meeting platform that is best suited to each meeting.

Q. If auto-captioning is so good, why do we need live remote captioning?

A. Auto-captioning is about 85 - 95% accurate, whereas live remote captioning, by a human, is 99% accurate. The 5 to 15% of errors makes a huge difference. Auto-captioning is most likely to make errors with uncommon words, such as domain specific vocabulary.

Contact Us

For assistance or to report accessibility problems please contact:

Andrew Normand
Web Accessibility Lead
Email: anormand@unimelb.edu.au
Phone: +61 3 9035 4867