Real-Time Transcription takes the audio content of a host's media stream and transcribes it into written words in real time. This page shows you how to start and stop Real-Time Transcription in your app, through a business server, then display the text in your app.
Understand the tech
To start transcribing the audio in a channel in real time, you send an
HTTP request to the Agora SD-RTN™
through your business server. Real-Time Transcription provides the following modes:
- Transcribe speech in real-time, then stream this data to the channel.
- Transcribe speech in real-time, store the text in the
WebVTTformat, and upload the file to third-party cloud storage.
Real-Time Transcription transcribes at most three speakers in a channel. When there are more than three speakers, the top three are selected based on volume, and their audio is transcribed.
The following figure shows the workflow to start, query, and stop a Real-Time Transcription task:
In order to use the RESTful API to transcribe speech, make the following calls:
acquire: Request a
builderTokenthat authenticates the user and gives permission to start Real-Time Transcription . You must call
builderTokenwithin five minutes.
start: Begin the transcription task. Once you start a task,
builderTokenremains valid for the entire session. Use the same
builderTokento query and stop the task.
query: Check the task status.
stop: Stop the transcription task.
In order to set up Real-Time Transcription in your app, you must have:
Enabled Real-Time Transcription for your project:
Activated a supported cloud storage service to record and store Real-Time Transcription videos and texts
Installed the Protobuf package to generate code classes for displaying transcription text.
To run the post-processing script, install:
- Python 3.0
Implement a business server
You create a business server as a bridge between your app and Agora Real-Time Transcription. Implementing a business server to manage Real-Time Transcription provides the following benefits:
- Improved security as your
taskId, are not exposed to the client.
- Token processing is securely handled on the business server.
- Avoid splicing complex request body strings on the client side to reduce the probability of errors.
- Implement additional functionality on the business server. For example, billing for Real-Time Transcription use, checking user privileges and payment status of a user.
- If the REST API is updated, you do not need to update the client.
Agora Real-Time Transcription supports only integer
- When you join a channel in your app, use an integer value for your UID. For example,
- When you start a Real-Time Transcription session set
uidto the same integer UID enclosed in quotation marks. For example,
To obtain sample code for your business server, see the:
- Real-Time Transcription business server demo - an example business server that follows the workflow described in this document
- Postman Collection - API reference and code examples for the language you want to program in.
Use Google Protobuf Generator to parse text data
Google Protocol buffers are an extensible and language-neutral mechanism for serializing transcription data. Protobuf enables you to generate source code in multiple languages, based on a specified structure. For more information about Google protocol buffers, see protobuf.dev.
Agora provides the following Protobuf template for parsing Real-Time Transcription data:
To read and display the Real-Time Transcription text in your client:
Copy the Protobuf template to a local
In your file, edit the following properties to match your project:
package: The source code package namespace.
option: The desired language options.
Generate a Protobuf class.
You run the
protocprotocol compiler on your
.protofile to generate the code that you need to work with the defined message types. The
protoccompiler is invoked as follows:
Agora also provides Protobuf sample code to parse and display transcription text. To obtain the sample code, contact firstname.lastname@example.org
Use the Protobuf class to read transcription text.
When transcription text is available, your app receives the
onStreamMessagecallback. You use the generated Protobuf class in you app to read the byte data returned by the callback. Refer to the API reference for callback details.
Synchronize transcription files with the cloud recording
m3u8+vtt file generated by Real-Time Transcription, and the
m3u8+ts file generated by Cloud Recording are two independent files. The time stamp references in these media
files are different, and not synchronized. The cloud recording time stamp starts at
0, while the
m3u8+vtt uses the system time stamp. If either process starts abnormally, the media files generated by the two services may be out of sync during playback.
Post-processing ensures synchronization of subtitles and recorded audio. It enables you to associate the
m3u8+ts file generated by cloud recording with the
m3u8+vtt file generated by Real-Time Transcription.
Agora provides a post-processing script that enables you to synchronize the two files.
Run the post-processing script
To synchronize files generated by Real-Time Transcription, take the following steps:
Unzip the post-processing script to a local folder.
Run the script on your Real-Time Transcription files:
ffmpeg/ffprobare not in your
–ffmpeg_pathto specify the path.
Play the synchronized files:
Start the HTTP server by running the following command:
In your browser, enter the following URL:
This section contains information that completes the information in this page, or points you to documentation that explains other aspects to this product.
Refer to the Real-Time Transcription REST API documentation for parameter details.
- Android: onStreamMessage
- Electron: onStreamMessage
- Flutter: onStreamMessage
- iOS: receiveStreamMessageFromUid
- Unity: onStreamMessage
- Windows: onStreamMessage
List of supported languages
Use the following language codes in the
recognizeConfig.language parameter of the start request. The current version supports at most two languages, separated by commas.
|Chinese (Cantonese, Traditional)||zh-HK|
|Chinese (Mandarin, Simplified)||zh-CN|
|Chinese (Taiwanese Putonghua)||zh-TW|
|Korean (South Korea)||ko-KR|
Supported third-party cloud storage services
The following third-party cloud storage service providers are supported: