Parse transcription data

Agora uses Protocol Buffers (protobuf) to serialize transcription data. Protobuf, developed by Google, is a language-neutral, platform-independent way to serialize structured data. It enables efficient, consistent data handling across platforms by generating source code in multiple programming languages. Learn more at protobuf.dev.

Understand the tech

Agora Real-Time STT provides the SttMessage.proto file that defines the message format for speech-to-text conversion. This format serializes transcribed text data into an efficient transmission format, such as binary or JSON, for transmission through the data stream. This guide explains how to generate target language code using the Protobuf Compiler protoc, deserialize the received data stream, and extract specific text fields from the deserialized data structure.

Prerequisites

To follow this procedure, you must:

Have a valid Agora Account.
Have a valid Agora project with an app ID and a temporary token or a token server. For details, see Agora account management.
Have a computer with access to the internet. If your network has a firewall, follow the steps in Firewall requirements.
Join a Video SDK channel as a host and start streaming. Refer to the Voice SDK quickstart guide.

Enable Real-Time STT for your app.
Install the Protobuf compiler to generate code classes that process transcription text.

info
Since the format of Protobuf may vary across versions, best practice is to ensure that the Protobuf SDK versions used for generated code and client deserialization are consistent.

Parse transcription data using Protobuf

Follow these steps to write a script that calls the protoc compiler to generate code in different languages.

Create a Protobuf definition file

Protobuf allows you to generate source code in your preferred language based on the structure defined in the .proto file. Agora provides the following Protobuf definition for parsing Real-Time Transcription data. To use the file for generating code:

Copy the following Protobuf definition to a local SttMessage.proto file:

_38syntax = "proto3"; _38 _38package Agora.SpeechToText; _38option objc_class_prefix = "Stt"; _38option csharp_namespace = "AgoraSTTSample.Protobuf"; _38option java_package = "io.agora.rtc.speech2text"; _38option java_outer_classname = "AgoraSpeech2TextProtobuffer"; _38 _38message Text { _38 int32 vendor = 1; _38 int32 version = 2; _38 int32 seqnum = 3; _38 int64 uid = 4; _38 int32 flag = 5; _38 int64 time = 6; _38 int32 lang = 7; _38 int32 starttime = 8; _38 int32 offtime = 9; _38 repeated Word words = 10; _38 bool end_of_segment = 11; _38 int32 duration_ms = 12; _38 string data_type = 13; _38 repeated Translation trans = 14; _38 string culture = 15; _38 int64 text_ts = 16; _38} _38message Word { _38 string text = 1; _38 int32 start_ms = 2; _38 int32 duration_ms = 3; _38 bool is_final = 4; _38 double confidence = 5; _38} _38message Translation { _38 bool is_final = 1; _38 string lang = 2; _38 repeated string texts = 3; _38}

For a description of each field in the SttMessage.proto file, browse the Reference section.
Edit the following properties in your .proto file to match your project:
- package: The source code package namespace.
- option: The desired language options.

Generate source code script

Java
Objective-C
C#
JavaScript

Create a shell script named generate_code.sh with the following content:

#!/bin/sh# Specify the path to the protoc compiler. In the example code# The Protobuf version used is 21.12. You can replace it according to your actual needs.PROTOC_PATH=./protoc-21.12-osx-aarch_64/bin/protoc# Specify the path to the .proto file. # The detailed description of the data structure can be found in the reference section.PROTO_FILE=./SttMessage.proto# Specify the output directory.JAVA_OUT_DIR=$(pwd)/code/java# Create the output directory (if it doesn't exist).mkdir -p $JAVA_OUT_DIR# Generate Java code.$PROTOC_PATH --java_out=$JAVA_OUT_DIR $PROTO_FILE# Output a message once code generation is finished.echo "Code generation completed."

Ensure Protobuf dependencies are installed. If the dependencies are already installed, skip this step.

Install Protobuf Dependencies

Edit your project’s Podfile to add the following line:

# 3.21.12 indicates the Protobuf version. You can choose the appropriate version according to your actual needs.pod "Protobuf", "3.21.12"

Run the following command in the Podfile directory:
```
pod install
```

Open the .xcworkspace file generated in the project folder to proceed in Xcode.

Create a shell script, named generate_code.sh and add the following code to it:

#!/bin/sh# Specify the path to the protoc compiler. In the example code, the Protobuf version used is 21.12. You can replace it according to your actual needs.PROTOC_PATH=./protoc-21.12-osx-aarch_64/bin/protoc# Specify the path to the .proto file. The detailed description of the data structure can be found in the reference information.PROTO_FILE=./SttMessage.proto# Specify the output directory.OBJC_OUT_DIR=$(pwd)/code/objective-c# Create the output directory (if it doesn't exist).mkdir -p $OBJC_OUT_DIR# Generate Objective-C code.$PROTOC_PATH --objc_out=$OBJC_OUT_DIR $PROTO_FILE# Output a message once code generation is finished.echo "Code generation completed."

Create a shell script named generate_code.sh with the following content:

#!/bin/sh# Path to the protoc compilerPROTOC_PATH=./protoc-21.12-osx-aarch_64/bin/protoc# Path to the .proto filePROTO_FILE=./SttMessage.proto# Output directoryCSHARP_OUT_DIR=$(pwd)/code/csharp# Create output directory if it doesn't existmkdir -p $CSHARP_OUT_DIR# Generate C# code$PROTOC_PATH --csharp_out=$CSHARP_OUT_DIR $PROTO_FILEecho "C# code generation completed."

Replace ./SttMessage.proto with the correct path as explained in the Create a Protobuf definition section.

To generate JavaScript code, ensure that the necessary Protobuf dependencies are installed. Follow the steps below to install them.

Install Protobuf Dependencies

Open your project’s root directory and edit the package.json file to include the following dependencies:

{   "dependencies": {     "protobufjs": "^7.2.5"   },   "devDependencies": {     "pbjs": "^0.0.14",     "protobufjs-cli": "^1.1.2"   } }

Run the following command to install the dependencies:
```
npm install
```

Next, create a shell script:

Create a file named generate_code.sh.
Add the following content:

# Add the executable file path of protobufjs-cli to the PATH environment variable# Replace {absolute path of protobufjs-cli in your node_modules}/bin with the absolute path of protobufjs-cli in node_modulesexport "PATH=$PATH:{absolute path of protobufjs-cli in your node_modules}/bin"# Generate JavaScript example codepbjs -t json-module -w es6 ./SttMessage.proto > ./SttMessage_es6.jsecho "JavaScript code generation completed."

Replace ./SttMessage.proto with the path to the file you created in the Create a Protobuf Template section.

Run the Script

To generate a Protobuf class, run these commands in your terminal:

# Make the script executable
chmod +x generate_code.sh
# Run the script
./generate_code.sh

Deserialize transcription data

When transcription text is available, your Video SDK event handler receives the stream message callback. Use the generated Protobuf class to deserialize the received data and convert it back into a data structure or object.

Java
C#
JavaScript
Objective-C
Swift

// Join the channel and add callback eventsrtcManager.joinChannel(roomName, localUid, agora_token, roleType.equals(ROLE_TYPE_BROADCAST), new RtcManager.OnChannelListener() {    ...    // Callback for receiving data stream messages    @Override    public void onStreamMessage(int uid, int streamId, byte[] data) {        // Check if the remote user ID matches the specified streaming bot ID; if so, decode the data stream into a text object        if (String.valueOf(uid).equalsIgnoreCase(RTC_UID_STT_STREAM)) {            AgoraSpeech2TextProtobuffer.Text text = STTManager.getInstance().parseTextByte(roomName, data);            // Convert the parsed text object into JSON format and log it            LogUtil.d(originLogName, mGson.toJson(text));        }    }    // ...});public AgoraSpeech2TextProtobuffer.Text parseTextByte(String channel, byte[] data) {    // Declare a variable of type AgoraSpeech2TextProtobuffer.Text to store the deserialized object    AgoraSpeech2TextProtobuffer.Text textStream;    try {        // Deserialize the byte array data into an AgoraSpeech2TextProtobuffer.Text object        textStream = AgoraSpeech2TextProtobuffer.Text.parseFrom(data);    } catch (Exception ex) {        notifyErrorHandler(new ErrorInfo("parseTextByte", "-1", "Error parsing from parseTextByte >> " + ex.toString()));        return null;    }    // ...  }

private void InitRtcEngine(){       // Create an RTC engine instance    RtcEngine = Agora.Rtc.RtcEngine.CreateAgoraRtcEngine();        // Create an instance of the event handler class    AgoraEventHandler handler = new AgoraEventHandler(this);        // Create the RtcEngineContext object and set the channel profile to live broadcasting    RtcEngineContext context = new RtcEngineContext(_appID, 0,        CHANNEL_PROFILE_TYPE.CHANNEL_PROFILE_LIVE_BROADCASTING,        AUDIO_SCENARIO_TYPE.AUDIO_SCENARIO_DEFAULT);        // Initialize the engine    RtcEngine.Initialize(context);        // Add callback events    RtcEngine.InitEventHandler(handler);}// Define a class to handle RTC-related callbacks, inheriting from IRtcEngineEventHandlerinternal class AgoraEventHandler : IRtcEngineEventHandler{       // Callback for receiving data stream messages    public override void OnStreamMessage(RtcConnection connection, uint remoteUid, int streamId, byte[] data, uint length, ulong sentTs)    {        // Debug.Log(String.Format("remoteUid: {0}", remoteUid));                // If the remote user ID equals the specified streaming bot ID        if (remoteUid == {pusher bot uid}) {            // Parse Protobuf data            AgoraSTTSample.Protobuf.Text t = ProtobufUtility.ParseProtobufData(data);            // ...        }    }}

import AgoraRTC from "agora-rtc-sdk-ng"import protoRoot from "@/protobuf/SttMessage_es6.js"// Create an RTC client instancethis.rtc.client = AgoraRTC.createClient({ mode: "live", codec: "vp8", role: this.role })// Listen for stream message events and bind the event handler functionthis.rtc.client.on("stream-message", this.onStreamMessage.bind(this))// Callback for receiving data stream messagesfunction onStreamMessage(uid, stream) {    // Check if the remote user ID is the specified streaming bot ID; if not, return directly and do not proceed with further processing    if (uid != {pusher bot uid}) {        return    }    // Use Protobuf to decode the received data stream    let textstream = protoRoot.Agora.SpeechToText.lookup("Text").decode(data)    // ...}

// Temp.h#import "AgoraRtcKit/AgoraRtcKit.h"#import "./Protobuff/SttMessage.pbobjc.h"NS_ASSUME_NONNULL_BEGIN@interface Temp : NSObject<AgoraRtcEngineDelegate>@endNS_ASSUME_NONNULL_END// Temp.m@implementation Temp// Callback for receiving data stream messages- (void)rtcEngine:(AgoraRtcEngineKit *)engine receiveStreamMessageFromUid:(NSUInteger)uid streamId:(NSInteger)streamId data:(NSData *)data {    // Check if the remote user ID is the specified streaming bot ID; if not, return directly and do not proceed with further processing    if (uid != pusherUid) {        return;    }    NSError* error;    // Decode the received data stream    SttText* st =  [SttText parseFromData: data error: &error];    // ...}@end

// Callback for receiving data stream messagesfunc rtcEngine(_ engine: AgoraRtcEngineKit, receiveStreamMessageFromUid uid: UInt, streamId: Int, data: Data) {    // Check if the remote user ID is the specified streaming bot ID; if not, return directly and do not proceed with further processing    guard uid == {pusher bot uid} else {        return    }    // Decode the received data stream    let text = try? SttText.parse(from: data)    // ...}

Reference

This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.

The following tables describe the fields in the SttMessage.proto file.

Text message fields

Field Name	Type	Description
`vendor`	`int32`	Reserved field.
`version`	`int32`	Reserved field.
`seqnum`	`int32`	Reserved field.
`uid`	`int64`	The user ID to which the text corresponds.
`flag`	`int32`	Reserved field.
`time`	`int64`	The start time of the transcription of this segment. It has a value only when `isFinal` is true; otherwise, it is 0.
`lang`	`int32`	Reserved field.
`starttime`	`int32`	Reserved field.
`offtime`	`int32`	Reserved field.
`words`	`repeated`	An array of transcription results. See WordMessage Types for details.
`end_of_segment`	`bool`	Reserved field.
`duration_ms`	`int32`	The time taken to transcribe, in milliseconds.
`data_type`	`string`	Data Type: - `transcribe`: Transcription. - `translate`: Text translation.
`trans`	`repeated`	An array of translation results. See TranslationMessage Types for details.
`culture`	`string`	The source language of the transcription.
`text_ts`	`int64`	The timestamp of the transcription used to align the source and target text during real-time translation.

Word message fields

Field Name	Type	Description
`text`	string	The result of transcription.
`start_ms`	int32	Reserved field.
`duration_ms`	int32	Reserved field.
`is_final`	bool	Is this sentence the final result of transcription? - true: This is the final result. - false: Not the final result. When this field is `true`, it indicates that the transcription engine believes the transcription result is finalized but doesn't imply the sentence has ended semantically.
`confidence`	double	Confidence indicates the speech-to-text engine’s confidence in the transcription result. Value: [0,1].

Translation message fields

Field Name	Type	Description
`is_final`	bool	Is this sentence the final result of the translation? - true: This is the final result. - false: Not the final result. When this field is `true`, it indicates that the translation engine believes the translation result is finalized but doesn't imply the sentence has ended semantically.
`lang`	string	The target language for the translation.
`texts`	repeated	The result of translation.

Demo project

Agora provides an open-source speech-to-text demo project for your reference. Download it or view the source code for a detailed example.

You are viewing Agora Docs forBetaproducts and features. Switch to Docs