Skip to main content

You are viewing Agora Docs forBetaproducts and features. Switch to Docs

Pricing

This page introduces the billing policy for the Real-Time STT add-on provided by Agora.

Your billing details may differ if you have signed a contract with Agora.

Overview

Agora calculates the billing of all projects under your Agora account on a monthly basis. Billing begins once you enable Real-Time STT.

Transcription fee

When Real-Time STT is enabled for a channel, it transcribes the audio of its active hosts. When Real-Time STT is enabled for specific hosts, it only transcribes the audio of the specified hosts and ignores the others. The Real-Time STT service employs algorithms that remove the periods of silence and improve WER (Word Error Rate) of transcription. The processed audio is transcribed by the Real-Time STT engine and referred to as transcription duration. Agora charges for the transcription duration of all or specified hosts in the channel.

The unit price is as follows:

Billing itemUsage, minutes per monthPricing, US$/1,000 minutes
Transcription durationAbove 016.99

Examples:

  • Let's say there is a channel existing for 10 minutes. There are 3 active hosts - A, B, and C - all in the unmuted state.
  • #1: If Real-Time STT is enabled for this channel at the start, the algorithm will remove 8 minutes of silent audio for host A, 7 minutes for host B, and 7 minutes for host C. Therefore, the transcription duration is (10 - 8) + (10 -7) + (10 - 7) = 2 + 3 + 3 = 8 minutes.
  • #2: If Real-Time STT is enabled for host A, the algorithm will remove 8 minutes of silent audio for host A. The transcription duration is 10 - 8 = 2 minutes.

Notes:

  • WER is known as the accuracy of an STT engine - the smaller, the better.
  • Enabling Real-Time STT for a channel or host that are quiet for a long time is not recommended. In this case, audio is processed and removed, and the STT engine runs in the standby mode. Agora will charge for this standby duration at $0.99/1,000 minutes. In example #1, the standby duration is calculated the following way: Enable duration - transcription duration = 10 - 8 = 2 minutes. In example #2, the standby duration is calculated the following way: Enable duration - transcription duration = 10 - 2 = 8 minutes.

Language identification fee

Real-Time STT supports dynamic language detection when two or more languages are enabled for a channel or specific hosts. The LID (language identification) duration is the same as the transcription duration.

Billing itemUsage, minutes per monthPricing, US$/1,000 minutes
Language identification durationAbove 05.00

Examples:

  • Let's say there is a channel existing for 10 minutes. There are 3 active hosts - A, B, and C - all in the unmuted state.
  • #3: If Spanish and Chinese LID is enabled for this channel at the start, the algorithm will remove 8 minutes of silent audio for host A, 7 minutes for host B and 7 minutes for host C. Therefore, the transcription duration is 2 + 3 + 3 = 8 minutes. the LID duration is 8 minutes, too, being the sum of 2 minutes for host A, 3 minutes for host B, and 3 minutes for host C.
  • If Spanish and Chinese LID is enabled for host A, then the transcription duration and LID duration are both 2 minutes.

Notes:

  • The Real-Time STT transcription duration does not change if you enable more than 1 language.
  • If only 1 language is set for a channel or a specified host, the language detection will not start.

Free-of-charge duration

Real-Time STT provides 300 minutes of free-of-charge duration for integration and testing purposes.

Contact sales@agora.io or your AE to get a discount.

vundefined