REST quickstart

Real-Time STT subscribes to the audio content of a media stream and transcribes it into text in real time. This page shows you how to use basic RESTful API to enable Real-Time STT.

Note that the command-line examples are for demonstration purposes only. In a production environment, send RESTful API requests through your application server.

Understand the tech

The following diagram illustrates the complete process of implementing Real-Time STT:

real-time-stt-flow

This includes the following steps:

acquire

Before starting the transcription, call acquire to obtain a transcription resource. If the request is successful, you get a resource ID in the response body.
start

Call start to join the channel and start the transcription. If the request is successful, you get a task ID in the response body that marks the current transcription session.
update

When Real-Time STT is enabled, the update command can support changing some parameters like language or whether to transcribe for a specific host or all hosts in a channel.
query

Between the start and stop calls, you can query the service status and update configuration parameters.
stop

Call stop to stop the transcription.

Prerequisites

To follow this procedure, you must:

Have a valid Agora Account.
Have a valid Agora project with an app ID and a temporary token or a token server. For details, see Agora account management.
Have a computer with access to the internet. If your network has a firewall, follow the steps in Firewall requirements.
Join a Video SDK channel as a host and start streaming. Refer to the Voice SDK quickstart guide.

Project setup

To enable Real-Time STT before using it for the first time, take the following steps:

Log in to Agora Console and open the Project Management page.
Find the project for which you want to enable Real-Time STT and click the pencil icon.
On the Edit Project page, find Real-Time STT and click Enable.
Click Enable Real-Time STT and Apply to confirm.

Now you can use Agora Real-Time STT and see the usage statistics on the Usage page.

Implement Real-Time STT

The following diagram shows a simple API call sequence of Real-Time STT, including querying the status and updating the configuration before the transcription stops:

real-time-transcription

Pass basic HTTP authentication

RESTful APIs require you to pass basic HTTP authentication by setting the Authorization parameter in every HTTP request header. For how to get the Authorization value, see RESTful authentication.

Acquire a token and reserve a resource

Call acquire to request buildToken and reserve a resource for Real-Time STT.

After a successful request, you get tokenName, also referred to as a resource ID, in the response body. The resource ID is valid for five minutes only, so you need to start transcribing within that time. Because the stop request also requires the resource ID, store it until the transcription is stopped. One resource ID can be used only for one transcription session.

Request example

curl --location -g 'https://api.agora.io/v1/projects/{{appId}}/rtsc/speech-to-text/builderTokens' \
--header 'Content-Type: application/json' \
--data '{
    // Set by developer, maximum length is 64 characters (a-z,A-Z,0-9,"-","_"). Recommended to use channel name as the instanceID.
    "instanceId": "XXXX"
}'

Response example

Success

_5{ _5 "createTs": 1678505791, // Create the time stamp of builderToken. _5 "instanceId": "XXXX", // Same as the acquire request. _5 "tokenName": "nUwUbQf9Zg6tsgtLslGnDg0lk8RYaUE09pqOuSIgwfy1uJa4K6lWCzLqJNDen8tHgNjbAcOGIWpgWQEllEvR86LKWnExC9WFhPSQo0Eim0W2guETD_yO4hsHLKNpKvcTivXo5PNOYqLEANOdsLbU8pQ5fRgwcxplOVI_GT5MR6YhPT-2O4h64xTS3qpMZv1qtV8dLpcaxTKDwK5zckGk6PKjRycZ_BClZTTKlKXKkfPztQNwyKa00UJDJK5uyZqzExx-Q_PGQEB2r-u4oWriMaqmSo1M8ShsI4TX-920jE0MoB_JBb5GHQUpmHcZOJCTMO2SiKwZLzMK0F-jAaWYBbhRAu3hnQ_LjtcWvDJEDWkEJZonYjTfENjvwOsjFPvp" _5}
Failure

_3{ _3 "message": string // Error reason _3}

Start the service

Call start within five minutes after getting the resource ID to join the channel and start a transcription session.

After a successful request you get taskID in the HTTP response body. This ID is a unique identification of the current transcription session.

Request example

This is a simple request example to start the Real-Time STT service. Refer to Encrypt captions, Record captions, and Transcribe specified hosts for more feature configurations.

curl --location -g 'https://api.agora.io/v1/projects/{{appId}}/rtsc/speech-to-text/tasks?builderToken={{tokenName}}' \
--header 'Content-Type: application/json' \
--data '{
    "languages": [
        "<YourTranscribeLanguages>"    // Can configure 2 languages max, for example, "en-US" and "ru-RU". If 2 languages are configured, the service will automatically enable language detection for an additional cost.
    ],
    "maxIdleTime": 60,                 // If the channel has no host, the service will be stopped after 60s. The configurable range is 5~2592000 seconds.
    "rtcConfig": {
        "channelName": "{<YourChannelName>}",          // The RTC channel name for which transcription needs to be enabled.
        "subBotUid": "{<YourSubscribeBotUid>}",        // The unique UID in the channel for the bot to subscribe to audio. Int UID needs to be filled as a string.
        "subBotToken":"{<YourSubscribeBotRtcToken>}",  // The RTC token for subBot to join the RTC channel. Optional, based on the RTC channel security configuration.
        "pubBotUid": "{{textUID}}",                    // The unique UID in the channel for the bot to publish text. Int UID needs to be filled as a string. subBotUid and pubBotUid MUST be different.
        "pubBotToken":"{{audioUIDChannelToken}}",      // The RTC token for pubBot to join the RTC channel. Apply the admin token.
    },
}'

Response example

Success

_5{ _5 "taskId": "String", // The taskId _5 "createTs": number, // The created timestamp _5 "status": enum(STATUS) // The task status: IDLE, PREPARING, IN_PROGRESS, STOPPING, STOPPED _5}
Failure

_3{ _3 "message": string // Error reason _3}

Query service status

You can call query to find out the status of the transcription session multiple times during a transcription session. After a successful request, you get the status and related information in the response body.

Request example

curl --location -g 'https://api.agora.io/v1/projects/{{appId}}/rtsc/speech-to-text/tasks/{{taskId}}?builderToken={{tokenName}}' \
--header 'Content-Type: application/json'

Response example

Success

_5{ _5 "taskId": "String", // The taskId _5 "createTs": number, // The created timestamp _5 "status": enum(STATUS) // The task status: IDLE, PREPARING, RUNNING, STOPPING, STOPPED, RECONNECTING _5}
Failure

_3{ _3 "message": string // Error reason _3}

Update service

See Update service for details.

Stop the service

Call stop to stop transcribing. After a successful request, you get the status of the transcription session in the response body.

Request example

curl --location -g --request DELETE 'https://api.agora.io/v1/projects/{{appID}}/rtsc/speech-to-text/tasks/{{taskId}}?builderToken={{tokenName}}' \
--header 'Content-Type: application/json'

Considerations

API calls

Call the start method within 5 minutes after obtaining a resource ID. In case of timeout you'll need to request a new resource ID.
Since a transcription session is started and stopped with the same resource ID, once a session is started, it will automatically stop when the resource ID expires. The default validity time is 24 hours and up to 48 hours.
String UIDs are supported only on a 128 host environment, with full support planned in the near future.
pubBotUid and subBotUid are int type UIDs that must be different to avoid unknown issues.

You are viewing Agora Docs forBetaproducts and features. Switch to Docs

REST quickstart

Understand the tech

Prerequisites

Project setup

Implement Real-Time STT

Pass basic HTTP authentication

Acquire a token and reserve a resource

Start the service

Query service status

Update service

Stop the service

Considerations

API calls

See also

Sample project

Demo app and source code

REST API middleware