Batch API development
Scribe's Batch API allows for asynchronous transcription of audio / video files through a RESTful API.
There are three steps to this asynchronous transcription:
- Start the transcription by sending Scribe the audio / video to transcribe.
- Wait for the transcription to complete.
- Retrieve the transcript in the desired output format.
The following sections provide some examples for how to perform these three steps. The examples do have prerequisites including:
# All Shell examples on this page require the following:
# - curl (https://everything.curl.dev/get)
Additionally, all examples make use of a Kensho access token set as an environment variable ACCESS_TOKEN
.
Additional information on access tokens, and how to obtain them, can be found in the authentication guide.
Starting a new transcription
Starting a new transcription, to transcribe an audio / video file is performed with a POST
request
to Scribe's API endpoint https://scribe.kensho.com/api/v2/transcription
. When making that request
you can provide one of two different content types for the data:
- Multipart Form-Data (
multipart/form-data
) which allows for transcribing with local data. - JSON (
application/json
) which allows for transcribing with remote data.
Either content type will allow additional JSON options for use during transcription.
Transcribing with local data
If the audio / video data to transcribe is 'local' to where your code runs (easily accessible, on the same file system, on an internal object storage network, etc) then Scribe will accept the audio / video during the transcription request.
# Assuming a local mp3 file named 'sample.mp3' and ACCESS_TOKEN environment variable is set.
curl -XPOST \
-F "media=@sample.mp3;type=application/mp3" \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
https://scribe.kensho.com/api/v2/transcription
If the file exists and your access token is valid, you should get a transcription id like:
{
"transcription_id": "502f2a9dcb69454582b23309fc6609b5"
}
which you can use later to check if the transcription is complete or to retrieve the final transcript.
Transcribing with remote data
If the audio / video data to transcribe is 'remote' (stored in a separate cloud service, etc) and:
- that data can be retrieved through a RESTful
GET
call - no header-based authentication is needed to retrieve the data (ie - no
Authorization
header, etc); mechanisms like presigned URLs are, however, fine.
then Scribe will accept a URL where it will attempt to retrieve the audio / video file to transcribe for the request.
# Make sure to replace the 'Media URL' where the data lives
# This assumes an ACCESS_TOKEN environment variable is set.
curl -XPOST \
-d '{"media_url": <Media URL>}' \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
https://scribe.kensho.com/api/v2/transcription
If the file exists and your access token is valid, you should get a transcription id like:
{
"transcription_id": "502f2a9dcb69454582b23309fc6609b5"
}
which you can use later to check if the transcription is complete or to retrieve the final transcript.
Waiting for the transcription to complete
It can take a few minutes for a transcription to complete, during which time you can 'poll' Scribe to see if the transcription is complete. Alternatively, or additionally, if you provided a callback URI when the transcription was submitted, Scribe will attempt to notify when the transcript is complete. In that way, Scribe supports two different, but non-exclusive, ways to determine when the transcription is complete:
- Polling to determine when the transcription is complete.
- Notification through a callback / webhook when the transcription is complete.
Polling for completion
To poll for completion you will need the transcription_id
that was returned from original transcription request.
Only the user who originally submitted the request will be able to poll for completion which is a HEAD
call to the https://scribe.kensho.com/api/v2/transcription/{transcription_id}
endpoint.
# Make sure to replace the 'Transcription Id' with the id obtained after submitting the transcription.
# This assumes an ACCESS_TOKEN environment variable is set.
curl -I \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
https://scribe.kensho.com/api/v2/transcription/<Transcription Id>
will return a HTTP/2 <status code>
message where:
- a 200 indicates the transcription is complete
- a 202 indicates the transcription is still in progress
Notification of completion
To be notified when a transcription is complete you will need to provide the callback_uri
option when submitting
the transcription request. If we are able to call the uri, when we are finished transcription, then we will send
the transcription_id
in the message to let you know what has completed. If successful, you can turn around and
use that transcription_id
to retrieve the transcript in the format that you
need.
An example of the payload sent to the callback URI is:
{
"transcription_id": "502f2a9dcb69454582b23309fc6609b5",
"result": "success"
}
Retrieving the final transcript
Retrieving the final transcript is performed with a GET
request to the
https://scribe.kensho.com/api/v2/transcription/{transcription_id}
endpoint. By default a structured JSON
transcript will be returned, but that can be changed if the Accept
header is set to one of the output
formats we can generate.
# Make sure to replace the 'Transcription Id' with the id obtained after submitting the transcription.
# This assumes an ACCESS_TOKEN environment variable is set.
curl \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Accept: application/vnd.openxmlformats-officedocument.wordprocessingml.document" \
-o sample.docx \
https://scribe.kensho.com/api/v2/transcription/<Transcription Id>
will result in a sample.docx
Microsoft Word document if the transcript completed successfully.