Skip to main content

You are viewing Agora Docs forBetaproducts and features. Switch to Docs

REST quickstart

Real-Time STT subscribes to the audio content of a media stream and transcribes it into text in real time. This page shows you how to use basic RESTful API to enable Real-Time STT.

Note that the command-line examples are for demonstration purposes only. In a production environment, send RESTful API requests through your application server.

Understand the tech

The following diagram illustrates the complete process of implementing Real-Time STT:

real-time-stt-flow

This includes the following steps:

  1. acquire

    Before starting the transcription, call acquire to obtain a transcription resource. If the request is successful, you get a resource ID in the response body.

  2. start

    Call start to join the channel and start the transcription. If the request is successful, you get a task ID in the response body that marks the current transcription session.

  3. update

    When Real-Time STT is enabled, the update command can support changing some parameters like language or whether to transcribe for a specific host or all hosts in a channel.

  4. query

    Between the start and stop calls, you can query the service status and update configuration parameters.

  5. stop

    Call stop to stop the transcription.

Prerequisites

To follow this procedure, you must:

Project setup

To enable Real-Time STT before using it for the first time, take the following steps:

  1. Log in to Agora Console and open the Project Management page.
  2. Find the project for which you want to enable Real-Time STT and click the pencil icon.
  3. On the Edit Project page, find Real-Time STT and click Enable.
  4. Click Enable Real-Time STT and Apply to confirm.

Now you can use Agora Real-Time STT and see the usage statistics on the Usage page.

Implement Real-Time STT

The following diagram shows a simple API call sequence of Real-Time STT, including querying the status and updating the configuration before the transcription stops:

real-time-transcription

Pass basic HTTP authentication

RESTful APIs require you to pass basic HTTP authentication by setting the Authorization parameter in every HTTP request header. For how to get the Authorization value, see RESTful authentication.

Acquire a token and reserve a resource

Call acquire to request buildToken and reserve a resource for Real-Time STT.

After a successful request, you get tokenName, also referred to as a resource ID, in the response body. The resource ID is valid for five minutes only, so you need to start transcribing within that time. Because the stop request also requires the resource ID, store it until the transcription is stopped. One resource ID can be used only for one transcription session.

Request example


_6
curl --location -g 'https://api.agora.io/v1/projects/{{appId}}/rtsc/speech-to-text/builderTokens' \
_6
--header 'Content-Type: application/json' \
_6
--data '{
_6
// Set by developer, maximum length is 64 characters (a-z,A-Z,0-9,"-","_"). Recommended to use channel name as the instanceID.
_6
"instanceId": "XXXX"
_6
}'

Response example

  • Success


    _5
    {
    _5
    "createTs": 1678505791, // Create the time stamp of builderToken.
    _5
    "instanceId": "XXXX", // Same as the acquire request.
    _5
    "tokenName": "nUwUbQf9Zg6tsgtLslGnDg0lk8RYaUE09pqOuSIgwfy1uJa4K6lWCzLqJNDen8tHgNjbAcOGIWpgWQEllEvR86LKWnExC9WFhPSQo0Eim0W2guETD_yO4hsHLKNpKvcTivXo5PNOYqLEANOdsLbU8pQ5fRgwcxplOVI_GT5MR6YhPT-2O4h64xTS3qpMZv1qtV8dLpcaxTKDwK5zckGk6PKjRycZ_BClZTTKlKXKkfPztQNwyKa00UJDJK5uyZqzExx-Q_PGQEB2r-u4oWriMaqmSo1M8ShsI4TX-920jE0MoB_JBb5GHQUpmHcZOJCTMO2SiKwZLzMK0F-jAaWYBbhRAu3hnQ_LjtcWvDJEDWkEJZonYjTfENjvwOsjFPvp"
    _5
    }

  • Failure


    _3
    {
    _3
    "message": string // Error reason
    _3
    }

Start the service

Call start within five minutes after getting the resource ID to join the channel and start a transcription session.

After a successful request you get taskID in the HTTP response body. This ID is a unique identification of the current transcription session.

Request example

This is a simple request example to start the Real-Time STT service. Refer to Encrypt captions, Record captions, and Transcribe specified hosts for more feature configurations.


_15
curl --location -g 'https://api.agora.io/v1/projects/{{appId}}/rtsc/speech-to-text/tasks?builderToken={{tokenName}}' \
_15
--header 'Content-Type: application/json' \
_15
--data '{
_15
"languages": [
_15
"<YourTranscribeLanguages>" // Can configure 2 languages max, for example, "en-US" and "ru-RU". If 2 languages are configured, the service will automatically enable language detection for an additional cost.
_15
],
_15
"maxIdleTime": 60, // If the channel has no host, the service will be stopped after 60s. The configurable range is 5~2592000 seconds.
_15
"rtcConfig": {
_15
"channelName": "{<YourChannelName>}", // The RTC channel name for which transcription needs to be enabled.
_15
"subBotUid": "{<YourSubscribeBotUid>}", // The unique UID in the channel for the bot to subscribe to audio. Int UID needs to be filled as a string.
_15
"subBotToken":"{<YourSubscribeBotRtcToken>}", // The RTC token for subBot to join the RTC channel. Optional, based on the RTC channel security configuration.
_15
"pubBotUid": "{{textUID}}", // The unique UID in the channel for the bot to publish text. Int UID needs to be filled as a string. subBotUid and pubBotUid MUST be different.
_15
"pubBotToken":"{{audioUIDChannelToken}}", // The RTC token for pubBot to join the RTC channel. Apply the admin token.
_15
},
_15
}'

Response example

  • Success


    _5
    {
    _5
    "taskId": "String", // The taskId
    _5
    "createTs": number, // The created timestamp
    _5
    "status": enum(STATUS) // The task status: IDLE, PREPARING, IN_PROGRESS, STOPPING, STOPPED
    _5
    }

  • Failure


    _3
    {
    _3
    "message": string // Error reason
    _3
    }

Query service status

You can call query to find out the status of the transcription session multiple times during a transcription session. After a successful request, you get the status and related information in the response body.

Request example


_2
curl --location -g 'https://api.agora.io/v1/projects/{{appId}}/rtsc/speech-to-text/tasks/{{taskId}}?builderToken={{tokenName}}' \
_2
--header 'Content-Type: application/json'

Response example

  • Success


    _5
    {
    _5
    "taskId": "String", // The taskId
    _5
    "createTs": number, // The created timestamp
    _5
    "status": enum(STATUS) // The task status: IDLE, PREPARING, RUNNING, STOPPING, STOPPED, RECONNECTING
    _5
    }

  • Failure


    _3
    {
    _3
    "message": string // Error reason
    _3
    }

Update service

See Update service for details.

Stop the service

Call stop to stop transcribing. After a successful request, you get the status of the transcription session in the response body.

Request example


_2
curl --location -g --request DELETE 'https://api.agora.io/v1/projects/{{appID}}/rtsc/speech-to-text/tasks/{{taskId}}?builderToken={{tokenName}}' \
_2
--header 'Content-Type: application/json'

Considerations

API calls

  • Call the start method within 5 minutes after obtaining a resource ID. In case of timeout you'll need to request a new resource ID.
  • Since a transcription session is started and stopped with the same resource ID, once a session is started, it will automatically stop when the resource ID expires. The default validity time is 24 hours and up to 48 hours.
  • String UIDs are supported only on a 128 host environment, with full support planned in the near future.
  • pubBotUid and subBotUid are int type UIDs that must be different to avoid unknown issues.

See also

Sample project

Agora provides a Postman collection with sample RESTful API requests for Real-Time STT.

Demo app and source code

Check our demo to try out Real-Time STT and evaluate its accuracy and latency.

You can also refer to the demo code on Github to see how captions and transcription are implemented. For more demo code, contact support@agora.io.

REST API middleware

Agora Go Backend Middleware is an open-source microservice that exposes a RESTful API designed to simplify Real-Time STT interactions with Agora. Written in Golang and powered by the Gin framework, this community project serves as a middleware to bridge front-end applications using Agora's Video SDK or Voice SDK with Agora's RESTful APIs.

vundefined