Parse transcription data
Agora uses Protocol Buffers (protobuf) to serialize transcription data. Protobuf, developed by Google, is a language-neutral, platform-independent way to serialize structured data. It enables efficient, consistent data handling across platforms by generating source code in multiple programming languages. Learn more at protobuf.dev.
Understand the tech
Agora Real-Time STT provides the SttMessage.proto
file that defines the message format for speech-to-text conversion. This format serializes transcribed text data into an efficient transmission format, such as binary or JSON, for transmission through the data stream. This guide explains how to generate target language code using the Protobuf Compiler protoc
, deserialize the received data stream, and extract specific text fields from the deserialized data structure.
Prerequisites
To follow this procedure, you must:
-
Have a valid Agora Account.
-
Have a valid Agora project with an app ID and a temporary token or a token server. For details, see Agora account management.
-
Have a computer with access to the internet. If your network has a firewall, follow the steps in Firewall requirements.
-
Join a Video SDK channel as a host and start streaming. Refer to the Voice SDK quickstart guide.
-
Enable Real-Time STT for your app.
-
Install the Protobuf compiler to generate code classes that process transcription text.
infoSince the format of Protobuf may vary across versions, best practice is to ensure that the Protobuf SDK versions used for generated code and client deserialization are consistent.
Parse transcription data using Protobuf
Follow these steps to write a script that calls the protoc
compiler to generate code in different languages.
Create a Protobuf definition file
Protobuf allows you to generate source code in your preferred language based on the structure defined in the .proto
file. Agora provides the following Protobuf definition for parsing Real-Time Transcription data. To use the file for generating code:
-
Copy the following Protobuf definition to a local
SttMessage.proto
file:For a description of each field in the
SttMessage.proto
file, browse the Reference section. -
Edit the following properties in your
.proto
file to match your project:package
: The source code package namespace.option
: The desired language options.
Generate source code script
- Java
- Objective-C
- C#
- JavaScript
Create a shell script named generate_code.sh
with the following content:
#!/bin/sh# Specify the path to the protoc compiler. In the example code# The Protobuf version used is 21.12. You can replace it according to your actual needs.PROTOC_PATH=./protoc-21.12-osx-aarch_64/bin/protoc# Specify the path to the .proto file. # The detailed description of the data structure can be found in the reference section.PROTO_FILE=./SttMessage.proto# Specify the output directory.JAVA_OUT_DIR=$(pwd)/code/java# Create the output directory (if it doesn't exist).mkdir -p $JAVA_OUT_DIR# Generate Java code.$PROTOC_PATH --java_out=$JAVA_OUT_DIR $PROTO_FILE# Output a message once code generation is finished.echo "Code generation completed."
Ensure Protobuf dependencies are installed. If the dependencies are already installed, skip this step.
Install Protobuf Dependencies
-
Edit your project’s
Podfile
to add the following line:# 3.21.12 indicates the Protobuf version. You can choose the appropriate version according to your actual needs.pod "Protobuf", "3.21.12"
-
Run the following command in the
Podfile
directory:pod install
Open the .xcworkspace
file generated in the project folder to proceed in Xcode.
Create a shell script, named generate_code.sh
and add the following code to it:
#!/bin/sh# Specify the path to the protoc compiler. In the example code, the Protobuf version used is 21.12. You can replace it according to your actual needs.PROTOC_PATH=./protoc-21.12-osx-aarch_64/bin/protoc# Specify the path to the .proto file. The detailed description of the data structure can be found in the reference information.PROTO_FILE=./SttMessage.proto# Specify the output directory.OBJC_OUT_DIR=$(pwd)/code/objective-c# Create the output directory (if it doesn't exist).mkdir -p $OBJC_OUT_DIR# Generate Objective-C code.$PROTOC_PATH --objc_out=$OBJC_OUT_DIR $PROTO_FILE# Output a message once code generation is finished.echo "Code generation completed."
Create a shell script named generate_code.sh
with the following content:
#!/bin/sh# Path to the protoc compilerPROTOC_PATH=./protoc-21.12-osx-aarch_64/bin/protoc# Path to the .proto filePROTO_FILE=./SttMessage.proto# Output directoryCSHARP_OUT_DIR=$(pwd)/code/csharp# Create output directory if it doesn't existmkdir -p $CSHARP_OUT_DIR# Generate C# code$PROTOC_PATH --csharp_out=$CSHARP_OUT_DIR $PROTO_FILEecho "C# code generation completed."
Replace ./SttMessage.proto
with the correct path as explained in the Create a Protobuf definition section.
To generate JavaScript code, ensure that the necessary Protobuf dependencies are installed. Follow the steps below to install them.
Install Protobuf Dependencies
-
Open your project’s root directory and edit the
package.json
file to include the following dependencies:{ "dependencies": { "protobufjs": "^7.2.5" }, "devDependencies": { "pbjs": "^0.0.14", "protobufjs-cli": "^1.1.2" } }
-
Run the following command to install the dependencies:
npm install
Next, create a shell script:
- Create a file named
generate_code.sh
. - Add the following content:
# Add the executable file path of protobufjs-cli to the PATH environment variable# Replace {absolute path of protobufjs-cli in your node_modules}/bin with the absolute path of protobufjs-cli in node_modulesexport "PATH=$PATH:{absolute path of protobufjs-cli in your node_modules}/bin"# Generate JavaScript example codepbjs -t json-module -w es6 ./SttMessage.proto > ./SttMessage_es6.jsecho "JavaScript code generation completed."
Replace ./SttMessage.proto
with the path to the file you created in the Create a Protobuf Template section.
Run the Script
To generate a Protobuf class, run these commands in your terminal:
Deserialize transcription data
When transcription text is available, your Video SDK event handler receives the stream message callback. Use the generated Protobuf class to deserialize the received data and convert it back into a data structure or object.
- Java
- C#
- JavaScript
- Objective-C
- Swift
// Join the channel and add callback eventsrtcManager.joinChannel(roomName, localUid, agora_token, roleType.equals(ROLE_TYPE_BROADCAST), new RtcManager.OnChannelListener() { ... // Callback for receiving data stream messages @Override public void onStreamMessage(int uid, int streamId, byte[] data) { // Check if the remote user ID matches the specified streaming bot ID; if so, decode the data stream into a text object if (String.valueOf(uid).equalsIgnoreCase(RTC_UID_STT_STREAM)) { AgoraSpeech2TextProtobuffer.Text text = STTManager.getInstance().parseTextByte(roomName, data); // Convert the parsed text object into JSON format and log it LogUtil.d(originLogName, mGson.toJson(text)); } } // ...});public AgoraSpeech2TextProtobuffer.Text parseTextByte(String channel, byte[] data) { // Declare a variable of type AgoraSpeech2TextProtobuffer.Text to store the deserialized object AgoraSpeech2TextProtobuffer.Text textStream; try { // Deserialize the byte array data into an AgoraSpeech2TextProtobuffer.Text object textStream = AgoraSpeech2TextProtobuffer.Text.parseFrom(data); } catch (Exception ex) { notifyErrorHandler(new ErrorInfo("parseTextByte", "-1", "Error parsing from parseTextByte >> " + ex.toString())); return null; } // ... }
private void InitRtcEngine(){ // Create an RTC engine instance RtcEngine = Agora.Rtc.RtcEngine.CreateAgoraRtcEngine(); // Create an instance of the event handler class AgoraEventHandler handler = new AgoraEventHandler(this); // Create the RtcEngineContext object and set the channel profile to live broadcasting RtcEngineContext context = new RtcEngineContext(_appID, 0, CHANNEL_PROFILE_TYPE.CHANNEL_PROFILE_LIVE_BROADCASTING, AUDIO_SCENARIO_TYPE.AUDIO_SCENARIO_DEFAULT); // Initialize the engine RtcEngine.Initialize(context); // Add callback events RtcEngine.InitEventHandler(handler);}// Define a class to handle RTC-related callbacks, inheriting from IRtcEngineEventHandlerinternal class AgoraEventHandler : IRtcEngineEventHandler{ // Callback for receiving data stream messages public override void OnStreamMessage(RtcConnection connection, uint remoteUid, int streamId, byte[] data, uint length, ulong sentTs) { // Debug.Log(String.Format("remoteUid: {0}", remoteUid)); // If the remote user ID equals the specified streaming bot ID if (remoteUid == {pusher bot uid}) { // Parse Protobuf data AgoraSTTSample.Protobuf.Text t = ProtobufUtility.ParseProtobufData(data); // ... } }}
import AgoraRTC from "agora-rtc-sdk-ng"import protoRoot from "@/protobuf/SttMessage_es6.js"// Create an RTC client instancethis.rtc.client = AgoraRTC.createClient({ mode: "live", codec: "vp8", role: this.role })// Listen for stream message events and bind the event handler functionthis.rtc.client.on("stream-message", this.onStreamMessage.bind(this))// Callback for receiving data stream messagesfunction onStreamMessage(uid, stream) { // Check if the remote user ID is the specified streaming bot ID; if not, return directly and do not proceed with further processing if (uid != {pusher bot uid}) { return } // Use Protobuf to decode the received data stream let textstream = protoRoot.Agora.SpeechToText.lookup("Text").decode(data) // ...}
// Temp.h#import "AgoraRtcKit/AgoraRtcKit.h"#import "./Protobuff/SttMessage.pbobjc.h"NS_ASSUME_NONNULL_BEGIN@interface Temp : NSObject<AgoraRtcEngineDelegate>@endNS_ASSUME_NONNULL_END// Temp.m@implementation Temp// Callback for receiving data stream messages- (void)rtcEngine:(AgoraRtcEngineKit *)engine receiveStreamMessageFromUid:(NSUInteger)uid streamId:(NSInteger)streamId data:(NSData *)data { // Check if the remote user ID is the specified streaming bot ID; if not, return directly and do not proceed with further processing if (uid != pusherUid) { return; } NSError* error; // Decode the received data stream SttText* st = [SttText parseFromData: data error: &error]; // ...}@end
// Callback for receiving data stream messagesfunc rtcEngine(_ engine: AgoraRtcEngineKit, receiveStreamMessageFromUid uid: UInt, streamId: Int, data: Data) { // Check if the remote user ID is the specified streaming bot ID; if not, return directly and do not proceed with further processing guard uid == {pusher bot uid} else { return } // Decode the received data stream let text = try? SttText.parse(from: data) // ...}
Reference
This section contains content that completes the information on this page, or points you to documentation that explains other aspects to this product.
The following tables describe the fields in the SttMessage.proto
file.
Text message fields
Field Name | Type | Description |
---|---|---|
vendor | int32 | Reserved field. |
version | int32 | Reserved field. |
seqnum | int32 | Reserved field. |
uid | int64 | The user ID to which the text corresponds. |
flag | int32 | Reserved field. |
time | int64 | The start time of the transcription of this segment. It has a value only when isFinal is true; otherwise, it is 0. |
lang | int32 | Reserved field. |
starttime | int32 | Reserved field. |
offtime | int32 | Reserved field. |
words | repeated | An array of transcription results. See WordMessage Types for details. |
end_of_segment | bool | Reserved field. |
duration_ms | int32 | The time taken to transcribe, in milliseconds. |
data_type | string | Data Type: - transcribe : Transcription. - translate : Text translation. |
trans | repeated | An array of translation results. See TranslationMessage Types for details. |
culture | string | The source language of the transcription. |
text_ts | int64 | The timestamp of the transcription used to align the source and target text during real-time translation. |
Word message fields
Field Name | Type | Description |
---|---|---|
text | string | The result of transcription. |
start_ms | int32 | Reserved field. |
duration_ms | int32 | Reserved field. |
is_final | bool | Is this sentence the final result of transcription? - true: This is the final result. - false: Not the final result. When this field is true , it indicates that the transcription engine believes the transcription result is finalized but doesn't imply the sentence has ended semantically. |
confidence | double | Confidence indicates the speech-to-text engine’s confidence in the transcription result. Value: [0,1]. |
Translation message fields
Field Name | Type | Description |
---|---|---|
is_final | bool | Is this sentence the final result of the translation? - true: This is the final result. - false: Not the final result. When this field is true , it indicates that the translation engine believes the translation result is finalized but doesn't imply the sentence has ended semantically. |
lang | string | The target language for the translation. |
texts | repeated | The result of translation. |
Demo project
Agora provides an open-source speech-to-text demo project for your reference. Download it or view the source code for a detailed example.