Pricing
This page introduces the billing policy for the Real-Time STT add-on provided by Agora.
Your billing details may differ if you have signed a contract with Agora.
Overview
Agora calculates the billing of all projects under your Agora account on a monthly basis. Billing begins once you enable Real-Time STT.
Transcription fee
When Real-Time STT is enabled for a channel, it transcribes the audio of its active hosts. When Real-Time STT is enabled for specific hosts, it only transcribes the audio of the specified hosts and ignores the others. The Real-Time STT service employs algorithms that remove the periods of silence and improve WER (Word Error Rate) of transcription. The processed audio is transcribed by the Real-Time STT engine and referred to as transcription duration. Agora charges for the transcription duration of all or specified hosts in the channel.
The unit price is as follows:
Billing item | Usage, minutes per month | Pricing, US$/1,000 minutes |
---|---|---|
Transcription duration | Above 0 | 16.99 |
Example
After you enable Real-Time STT:
- Host A speaks for 2 minutes and remains silent for 8 minutes.
- Host B speaks for 3 minutes and remains silent for 7 minutes.
- Host C speaks for 3 minutes and remains silent for 7 minutes.
- All hosts are silent for the first 2 minutes of the call.
In this case, the total transcription minutes are calculated as 2 (Host A) + 3 (Host B) + 3 (Host C) = 8 minutes. The silent periods of each host, including the time spent listening to others, are not counted towards the transcription duration.
- WER is a measure of the accuracy of an STT engine - the lower, the better.
- Real-Time STT does not incur additional RTC audio fee.
- Enabling Real-Time STT for channels or hosts that are silent for long periods is not recommended. In the example, during the first 2 minutes, the Real-Time STT worker processes all hosts' audio to remove silent portions. In this case, Agora charges for the first 2 minutes, and the STT engine standby time is billed at $0.99/1,000 minutes with the same discount applied as for RTC audio.
Language identification fee
Real-Time STT supports dynamic language detection when two or more languages are enabled for a channel or specific hosts. The LID (language identification) duration is the same as the transcription duration.
Billing item | Usage, minutes per month | Pricing, US$/1,000 minutes |
---|---|---|
Language identification duration | Above 0 | 5.00 |
Examples:
- Let's say there is a channel existing for 10 minutes. There are 3 active hosts - A, B, and C - all in the unmuted state.
- #3: If Spanish and Chinese LID is enabled for this channel at the start, the algorithm will remove 8 minutes of silent audio for host A, 7 minutes for host B and 7 minutes for host C. Therefore, the transcription duration is 2 + 3 + 3 = 8 minutes. the LID duration is 8 minutes, too, being the sum of 2 minutes for host A, 3 minutes for host B, and 3 minutes for host C.
- If Spanish and Chinese LID is enabled for host A, then the transcription duration and LID duration are both 2 minutes.
Notes:
- The Real-Time STT transcription duration does not change if you enable more than 1 language.
- If only 1 language is set for a channel or a specified host, the language detection will not start.
Free-of-charge duration
Real-Time STT provides 300 minutes of free-of-charge duration for integration and testing purposes.
Contact sales@agora.io or your AE to get a discount.