Scribe
V2 (latest)
Batch API development

Batch API development

Scribe's Batch API allows for asynchronous transcription of audio / video files through a RESTful API.

There are three steps to this asynchronous transcription:

  1. Start the transcription by sending Scribe the audio / video to transcribe.
  2. Wait for the transcription to complete.
  3. Retrieve the transcript in the desired output format.

The following sections provide some examples for how to perform these three steps. The examples do have prerequisites including:

# All Shell examples on this page require the following:
# - curl (https://everything.curl.dev/get)

Additionally, all examples make use of a Kensho access token set as an environment variable ACCESS_TOKEN. Additional information on access tokens, and how to obtain them, can be found in the authentication guide.


Starting a new transcription

Starting a new transcription, to transcribe an audio / video file is performed with a POST request to Scribe's API endpoint https://scribe.kensho.com/api/v2/transcription. When making that request you can provide one of two different content types for the data:

  1. Multipart Form-Data (multipart/form-data) which allows for transcribing with local data.
  2. JSON (application/json) which allows for transcribing with remote data.

Either content type will allow additional JSON options for use during transcription.

Transcribing with local data

If the audio / video data to transcribe is 'local' to where your code runs (easily accessible, on the same file system, on an internal object storage network, etc) then Scribe will accept the audio / video during the transcription request.

# Assuming a local mp3 file named 'sample.mp3' and ACCESS_TOKEN environment variable is set.
curl -XPOST \
   -F "media=@sample.mp3;type=application/mp3" \
   -H "Authorization: Bearer ${ACCESS_TOKEN}" \
   https://scribe.kensho.com/api/v2/transcription

If the file exists and your access token is valid, you should get a transcription id like:

{
  "transcription_id": "502f2a9dcb69454582b23309fc6609b5"
}

which you can use later to check if the transcription is complete or to retrieve the final transcript.

Transcribing with remote data

If the audio / video data to transcribe is 'remote' (stored in a separate cloud service, etc) and:

  • that data can be retrieved through a RESTful GET call
  • no header-based authentication is needed to retrieve the data (ie - no Authorization header, etc); mechanisms like presigned URLs are, however, fine.

then Scribe will accept a URL where it will attempt to retrieve the audio / video file to transcribe for the request.

# Make sure to replace the 'Media URL' where the data lives
# This assumes an ACCESS_TOKEN environment variable is set.
curl -XPOST \
   -d '{"media_url": <Media URL>}' \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer ${ACCESS_TOKEN}" \
   https://scribe.kensho.com/api/v2/transcription

If the file exists and your access token is valid, you should get a transcription id like:

{
  "transcription_id": "502f2a9dcb69454582b23309fc6609b5"
}

which you can use later to check if the transcription is complete or to retrieve the final transcript.


Waiting for the transcription to complete

It can take a few minutes for a transcription to complete, during which time you can 'poll' Scribe to see if the transcription is complete. Alternatively, or additionally, if you provided a callback URI when the transcription was submitted, Scribe will attempt to notify when the transcript is complete. In that way, Scribe supports two different, but non-exclusive, ways to determine when the transcription is complete:

  1. Polling to determine when the transcription is complete.
  2. Notification through a callback / webhook when the transcription is complete.

Polling for completion

To poll for completion you will need the transcription_id that was returned from original transcription request. Only the user who originally submitted the request will be able to poll for completion which is a HEAD call to the https://scribe.kensho.com/api/v2/transcription/{transcription_id} endpoint.

# Make sure to replace the 'Transcription Id' with the id obtained after submitting the transcription.
# This assumes an ACCESS_TOKEN environment variable is set.
curl -I \
   -H "Authorization: Bearer ${ACCESS_TOKEN}" \
   https://scribe.kensho.com/api/v2/transcription/<Transcription Id>

will return a HTTP/2 <status code> message where:

  • a 200 indicates the transcription is complete
  • a 202 indicates the transcription is still in progress

Notification of completion

To be notified when a transcription is complete you will need to provide the callback_uri option when submitting the transcription request. If we are able to call the uri, when we are finished transcription, then we will send the transcription_id in the message to let you know what has completed. If successful, you can turn around and use that transcription_id to retrieve the transcript in the format that you need.

An example of the payload sent to the callback URI is:

{
  "transcription_id": "502f2a9dcb69454582b23309fc6609b5",
  "result": "success"
}

Retrieving the final transcript

Retrieving the final transcript is performed with a GET request to the https://scribe.kensho.com/api/v2/transcription/{transcription_id} endpoint. By default a structured JSON transcript will be returned, but that can be changed if the Accept header is set to one of the output formats we can generate.

# Make sure to replace the 'Transcription Id' with the id obtained after submitting the transcription.
# This assumes an ACCESS_TOKEN environment variable is set.
curl \
   -H "Authorization: Bearer ${ACCESS_TOKEN}" \
   -H "Accept: application/vnd.openxmlformats-officedocument.wordprocessingml.document" \
   -o sample.docx \
   https://scribe.kensho.com/api/v2/transcription/<Transcription Id>

will result in a sample.docx Microsoft Word document if the transcript completed successfully.