Scribe
V2 (latest)
Human in the loop development

Human in the loop development

Human in the loop (HITL) transcription uses human transcribers to correct and format requests to produce exceptionally high quality transcripts. HITL is not available by default and needs to be enabled for you by our support team before it can be used. If you would like additional information on HITL, please contact the Scribe team (opens in a new tab) for further details.

Before developing for HITL it is advisable to get familiar with how to develop against the Batch API for AI transcription since they are very similar. The key differences between AI and HITL are the transcription options, the output formats, and the time to completion. Since HITL does require human review, the turnaround times will be considerably longer than with AI but within contractual obligations. Additional HITL API specifics, from the API specifications, include:

  • The transcriber option must be set to human when submitting the transcription request
  • The hotwords option will be ignored since we are using humans to correct transcriptions
  • The media_language option can be provided to perform translation before transcription if your contract allows it. The output will always be in English but the input could be one the ISO-639-1 or ISO-639-3 language codes that we accept. If you specify an option other than English (en or eng) and your contract does not allow for translations then your requests will ultimately fail; if you specify the incorrect code (ie - fr instead of es) then there could be a delay while we find the correct translator.
  • The priority option can be provided to alter the turnaround time based on information in your contract. The exact mapping between the values and the turn around times will vary and the Scribe team can help answer exactly what the mapping is. For example, if you are in a contract with 8, 12 and 24 hour turnaround times then we'd expect high, medium and low priorities respectively; if you are in a contract with just 8 and 12 hour turnaround then we'd expect high and medium priorities respectively.
  • The context option can be provided to inform the human transcription team additional 'context' about the audio including the speakers, their bio, dates, etc.
  • You will only be able to retrieve transcripts as .docx Word documents. That means you will always need to set the correct Accept header when GETing the results. For HITL, the Accept header should always be set to application/vnd.openxmlformats-officedocument.wordprocessingml.document when retrieving the final transcripts.

Best Practices

Webhooks

Due to the time it takes to return a HITL transcription it is advisable to make use of webhooks (over polling) to know when transcriptions are complete. Setting up HITL for webhook notification, and the final notification of completion, is exactly the same as it is for AI transcriptions. You can certainly poll for completion, but you might want to reduce the frequency of polls to once every 10 to 15 minutes.

Additional Context

There may be additional information / context available about the call that you would like to pass on to the human team doing the transcription. This could include a reference or call ID that you know about internally, a summary of the audio, information about the speakers on the audio call, etc. This information can help our transcription team provide higher quality transcripts, faster. This additional context is passed with the context option when the transcription job is initially submitted.

The additional context can be provided in any fashion, from a string to a dictionary of values, but a dictionary of key and values will be preferred. There are no fixed key values for the context dictionary, but the following are recommended:

  • id - An ID for your own reference, quite possibly one in your system that you are tracking. This could be a UUID, a short string or something else that you use to reference transcriptions internally.
  • title - A short name for the transcription
  • description - A description of the transcription, possibly with speakers, speaker biography or agenda.

Starting a New Transcription

If the audio / video data to transcribe is 'local' to where your code runs (easily accessible, on the same file system, on an internal object storage network, etc) then Scribe will accept the audio / video during the transcription request.

# Assuming a local mp3 file named 'sample.mp3' and ACCESS_TOKEN environment variable is set.
curl -XPOST \
   -F "media=@sample.mp3;type=application/mp3" \
   -F 'options={"transcriber": "human", "priority": "medium"};type=application/json' \
   -H "Authorization: Bearer ${ACCESS_TOKEN}" \
   https://scribe.kensho.com/api/v2/transcription

If the file exists and your access token is valid, you should get a transcription id like:

{
  "transcription_id": "502f2a9dcb69454582b23309fc6609b5"
}

which you can use later to check if the transcription is complete or to retrieve the final transcript.

Transcribing with Remote Data

If the audio / video data to transcribe is 'remote' (stored in a separate cloud service, etc) and:

  • that data can be retrieved through a RESTful GET call
  • no header-based authentication is needed to retrieve the data (ie - no Authorization header, etc); mechanisms like presigned URLs are, however, fine.

then Scribe will accept a URL where it will attempt to retrieve the audio / video file to transcribe for the request.

# Make sure to replace the 'Media URL' where the data lives
# This assumes an ACCESS_TOKEN environment variable is set.
curl -XPOST \
   -d '{"media_url": <Media URL>, "transcriber": "human", "priority": "medium"}' \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer ${ACCESS_TOKEN}" \
   https://scribe.kensho.com/api/v2/transcription

If the file exists and your access token is valid, you should get a transcription id like:

{
  "transcription_id": "502f2a9dcb69454582b23309fc6609b5"
}

which you can use later to check if the transcription is complete or to retrieve the final transcript.


Retrieving the Final Transcript

Retrieving the final transcript is performed with a GET request to the https://scribe.kensho.com/api/v2/transcription/{transcription_id} endpoint. You will need to set the Accept header to application/vnd.openxmlformats-officedocument.wordprocessingml.document to retrieve the resulting word document transcript.

# Make sure to replace the 'Transcription Id' with the id obtained after submitting the transcription.
# This assumes an ACCESS_TOKEN environment variable is set.
# This will result in a 'sample.docx' Word document on success.
curl \
   -H "Authorization: Bearer ${ACCESS_TOKEN}" \
   -H "Accept: application/vnd.openxmlformats-officedocument.wordprocessingml.document" \
   -o sample.docx \
   https://scribe.kensho.com/api/v2/transcription/<Transcription Id>