Skip to main content

You are viewing Agora Docs forBetaproducts and features. Switch to Docs

Parse transcription data

Agora uses Protocol Buffers (protobuf) to serialize transcription data. Protobuf, developed by Google, is a language-neutral, platform-independent way to serialize structured data. It enables efficient, consistent data handling across platforms by generating source code in multiple programming languages. Learn more at protobuf.dev.

Understand the tech

Agora Real-Time STT provides the SttMessage.proto file that defines the message format for speech-to-text conversion. This format serializes transcribed text data into an efficient transmission format, such as binary or JSON, for transmission through the data stream. This guide explains how to generate target language code using the Protobuf Compiler protoc, deserialize the received data stream, and extract specific text fields from the deserialized data structure.

Prerequisites

To follow this procedure, you must:

  • Enable Real-Time STT for your app.

  • Install the Protobuf compiler to generate code classes that process transcription text.

    info

    Since the format of Protobuf may vary across versions, best practice is to ensure that the Protobuf SDK versions used for generated code and client deserialization are consistent.

Parse transcription data using Protobuf

Follow these steps to write a script that calls the protoc compiler to generate code in different languages.

Create a Protobuf definition file

Protobuf allows you to generate source code in your preferred language based on the structure defined in the .proto file. Agora provides the following Protobuf definition for parsing Real-Time Transcription data. To use the file for generating code:

  1. Copy the following Protobuf definition to a local SttMessage.proto file:


    _38
    syntax = "proto3";
    _38
    _38
    package Agora.SpeechToText;
    _38
    option objc_class_prefix = "Stt";
    _38
    option csharp_namespace = "AgoraSTTSample.Protobuf";
    _38
    option java_package = "io.agora.rtc.speech2text";
    _38
    option java_outer_classname = "AgoraSpeech2TextProtobuffer";
    _38
    _38
    message Text {
    _38
    int32 vendor = 1;
    _38
    int32 version = 2;
    _38
    int32 seqnum = 3;
    _38
    int64 uid = 4;
    _38
    int32 flag = 5;
    _38
    int64 time = 6;
    _38
    int32 lang = 7;
    _38
    int32 starttime = 8;
    _38
    int32 offtime = 9;
    _38
    repeated Word words = 10;
    _38
    bool end_of_segment = 11;
    _38
    int32 duration_ms = 12;
    _38
    string data_type = 13;
    _38
    repeated Translation trans = 14;
    _38
    string culture = 15;
    _38
    int64 text_ts = 16;
    _38
    }
    _38
    message Word {
    _38
    string text = 1;
    _38
    int32 start_ms = 2;
    _38
    int32 duration_ms = 3;
    _38
    bool is_final = 4;
    _38
    double confidence = 5;
    _38
    }
    _38
    message Translation {
    _38
    bool is_final = 1;
    _38
    string lang = 2;
    _38
    repeated string texts = 3;
    _38
    }

    For a description of each field in the SttMessage.proto file, browse the Reference section.

  2. Edit the following properties in your .proto file to match your project:

    • package: The source code package namespace.
    • option: The desired language options.

Generate source code script

Create a shell script named generate_code.sh with the following content:

#!/bin/sh# Specify the path to the protoc compiler. In the example code# The Protobuf version used is 21.12. You can replace it according to your actual needs.PROTOC_PATH=./protoc-21.12-osx-aarch_64/bin/protoc# Specify the path to the .proto file. # The detailed description of the data structure can be found in the reference section.PROTO_FILE=./SttMessage.proto# Specify the output directory.JAVA_OUT_DIR=$(pwd)/code/java# Create the output directory (if it doesn't exist).mkdir -p $JAVA_OUT_DIR# Generate Java code.$PROTOC_PATH --java_out=$JAVA_OUT_DIR $PROTO_FILE# Output a message once code generation is finished.echo "Code generation completed."

Run the Script

To generate a Protobuf class, run these commands in your terminal:


_5
# Make the script executable
_5
chmod +x generate_code.sh
_5
_5
# Run the script
_5
./generate_code.sh

Deserialize transcription data

When transcription text is available, your Video SDK event handler receives the stream message callback. Use the generated Protobuf class to deserialize the received data and convert it back into a data structure or object.

// Join the channel and add callback eventsrtcManager.joinChannel(roomName, localUid, agora_token, roleType.equals(ROLE_TYPE_BROADCAST), new RtcManager.OnChannelListener() {    ...    // Callback for receiving data stream messages    @Override    public void onStreamMessage(int uid, int streamId, byte[] data) {        // Check if the remote user ID matches the specified streaming bot ID; if so, decode the data stream into a text object        if (String.valueOf(uid).equalsIgnoreCase(RTC_UID_STT_STREAM)) {            AgoraSpeech2TextProtobuffer.Text text = STTManager.getInstance().parseTextByte(roomName, data);            // Convert the parsed text object into JSON format and log it            LogUtil.d(originLogName, mGson.toJson(text));        }    }    // ...});public AgoraSpeech2TextProtobuffer.Text parseTextByte(String channel, byte[] data) {    // Declare a variable of type AgoraSpeech2TextProtobuffer.Text to store the deserialized object    AgoraSpeech2TextProtobuffer.Text textStream;    try {        // Deserialize the byte array data into an AgoraSpeech2TextProtobuffer.Text object        textStream = AgoraSpeech2TextProtobuffer.Text.parseFrom(data);    } catch (Exception ex) {        notifyErrorHandler(new ErrorInfo("parseTextByte", "-1", "Error parsing from parseTextByte >> " + ex.toString()));        return null;    }    // ...  }

Reference

This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.

The following tables describe the fields in the SttMessage.proto file.

Text message fields

Field NameTypeDescription
vendorint32Reserved field.
versionint32Reserved field.
seqnumint32Reserved field.
uidint64The user ID to which the text corresponds.
flagint32Reserved field.
timeint64The start time of the transcription of this segment. It has a value only when isFinal is true; otherwise, it is 0.
langint32Reserved field.
starttimeint32Reserved field.
offtimeint32Reserved field.
wordsrepeatedAn array of transcription results. See WordMessage Types for details.
end_of_segmentboolReserved field.
duration_msint32The time taken to transcribe, in milliseconds.
data_typestringData Type:
- transcribe: Transcription.
- translate: Text translation.
transrepeatedAn array of translation results. See TranslationMessage Types for details.
culturestringThe source language of the transcription.
text_tsint64The timestamp of the transcription used to align the source and target text during real-time translation.

Word message fields

Field NameTypeDescription
textstringThe result of transcription.
start_msint32Reserved field.
duration_msint32Reserved field.
is_finalboolIs this sentence the final result of transcription?
- true: This is the final result.
- false: Not the final result.
When this field is true, it indicates that the transcription engine believes the transcription result is finalized but doesn't imply the sentence has ended semantically.
confidencedoubleConfidence indicates the speech-to-text engine’s confidence in the transcription result. Value: [0,1].

Translation message fields

Field NameTypeDescription
is_finalboolIs this sentence the final result of the translation?
- true: This is the final result.
- false: Not the final result.
When this field is true, it indicates that the translation engine believes the translation result is finalized but doesn't imply the sentence has ended semantically.
langstringThe target language for the translation.
textsrepeatedThe result of translation.

Demo project

Agora provides an open-source speech-to-text demo project for your reference. Download it or view the source code for a detailed example.

vundefined