Zoom Meetings and Live Transcription

Live Transcription v. Captioning

Transcription is the process of converting audio into text. Captioning involves taking a transcript, adding non-speech audio information and dividing it into time coded segments.

On most video players, clicking on the CC (Closed Caption) button displays a pre-existing text transcript. In Zoom, the CC button is used for both live transcription and 3rd party captioning.

Enabling Live Transcription

The meeting host is the only person who can enable live transcription. This can be done by clicking on the CC button, which provides 4 options:

  1. Assign a participant to type. This is not recommended as it is highly unlikely that the typist will have the desired level of speed and accuracy.
  2. I will type. This is not recommended.
  3. Copy the API token. This allows a 3rd party captioning provider to provide Live Remote Captioning.
  4. Enable Auto-Transcription. This allows an AI service to convert audio into a text transcript. Note: Live transcription is not supported in breakout rooms.
Screenshot showing options available after clicking the CC button
Figure 1: Clicking on the Closed Caption button displays a list of transcription and captioning options.

Participant Requests

Participants may request live transcription if you have not already enabled it. If it is requested, you will be prompted to respond with 'Enable', 'Decline', or 'Decline and don’t ask again'. This request can be made anonymously, so you may not see the name of the participant requesting this feature.

Screenshot of participant transcription request

Caption Preferences

You can adjust the font size of closed captions by going to Settings > Accessibility and moving the font size slider.

Screenshot of Zoom Accessibility settings


Are automatically generated captions OK for accessibility purposes?

The issue with AI generated transcriptions, is accuracy. They tend to be about 90 - 95% accurate, but that is too low for users who are deaf or hard of hearing. Students who are hard of hearing have advised that as soon as they see a video is auto-captioned, they turn it off.

Automated live transcription is beneficial to people who have English as a second language or who are meeting in noisy or quiet environments.

Why use automatically generated captions if they are not accurate?

Some users that are deaf or hard of hearing find that AI generated captions are useful for ad hoc meetings, where there isn't time to arrange captioning by a 3rd party. So there is a tradeoff between accuracy and the convenience of captioning on demand.

Will automatically generated transcripts replace live remote captioning provided by Disability Support?

No. Automated live transcription is not a replacement for live remote captioning, particularly where a high degree of accuracy is required, such as a lecture or tutorial. As a result, captioning services currently provided to students via Disability Support will not change.