Real time specification
Scribe provides a websocket-based API that allows you to stream chunks of audio in real time, at a granularity of down to 0.25 seconds. Scribe will then stream back transcribed audio within approximately 15 to 30 seconds. The real time API is accessible at:
wss://scribe.kensho.com/ws
Workflow
After connecting to the Real Time API, you should issue an Authenticate
request
and wait for a Authenticated
response from the server before sending transcription
requests. To start transcribing audio, send a StartTranscription
request,
and wait for a TranscriptionStarted
response from the server before sending audio.
You can then upload chunks of audio between 250 milliseconds and 15 seconds using the
AddData
method.
Uploads must not exceed 1.5x real time, with a 30 second buffer.
When audio is uploaded, the server will acknowledge it by sending a DataAdded
response.
When transcribed text is available, the server will send an AddTranscript
message.
After finishing uploading audio, the client must send an EndOfStream
message, after which
the server will transcribe all remaining audio and issue an EndOfTranscript
message.
Client Initiated Messages
Authenticate
Authenticate the websocket connection. This must be the first method called after opening the websocket.
{
"message": "Authenticate",
"token": str, # The authentication token
}
StartTranscription
Start a transcription request. This must be the first method called after authentication.
{
"message": "StartTranscription",
"audio_format": {
"type": str, # Only 'RAW' supported
"encoding": str, # Only 'pcm_s16le' supported
"sample_rate_hz": int, # Only 16000 supported
"num_channels": int, # Only 1 supported
},
"hotwords": List[str], # An optional list of up to 1024 words to weight higher on transcription
},
}
ResumeTranscription
Resumes a transcription request if the connection is dropped
before being complete. When attempting to resume a transcription
this must be first and only message before AddData
{
"message": "ResumeTranscription",
"request_id": str, # The request id returned from the previous TranscriptionStarted message
"token": str, # The authentication token
}
AddData
Add audio to the server
{
"message": "AddData",
"audio": str, # base64 encoded string representing audio
"sequence_number": int # Starting at 0, each `AddData` must increment by 1
}
EndOfStream
Called when the client has finished uploading audio data to the server
{
"message": "EndOfStream",
"last_sequence_number": int
}
Server Initiated Messages
Authenticated
Called when the server has received a Authenticate
request from a client with a valid
authenication token.
If the request fails, the server will send an Error
instead of Authenticated
{
"message": "Authenticated",
}
TranscriptionStarted
Called when the server has received a StartTranscription
request from a client.
If the request fails, the server will send an Error
instead of TranscriptionStarted
{
"message": "TranscriptionStarted",
"request_id": str
}
TranscriptionResumed
Called when the server has received a ResumeTranscription
request from a client.
If the request fails, the server will send an Error
instead of TranscriptionResumed
{
"message": "TranscriptionResumed",
"request_id": str,
"sequence_number": int # The sequence number expected with the next `AddData` message
}
DataAdded
Acknowledge data added by the client to the server.
{
"message": "DataAdded",
"sequence_number": int
}
AddTranscript
Send transcribed text back to the client
{
"message": "AddTranscript",
"transcript": "<transcript format defined below>"
}
Transcript format
{
"transcript": str,
"accuracy": float(0, 1),
"sequence_number": int,
"speaker_id": int,
"speaker_accuracy": float,
"token_meta": [
{
"transcript": str,
"accuracy": float(0, 1),
"start_ms": float,
"duration_ms": float,
"align_success": bool
},
...
]
}
EndOfTranscript
Signal to the client that all audio has been transcribed and returned
{
"message": "EndOfTranscript",
}
Error
{
"message": "Error",
"type": str,
"reason": str
}
Error Handling
There are a number of reasons a message could raise an error: Exceeding the rate limit, or uploading
invalid audio are two examples.
If any message uploaded by a client triggers an error, the server will emit an Error
message,
with information about why the error happened.
Once an Error
has been emitted to a client, all subsequent requests will also emit an Error
and fail.
Real Time API Example Usage
For an example usage, please see the Real time development guide