Human in the loop development
Human in the loop (HITL) transcription uses human transcribers to correct and format requests to produce exceptionally high quality transcripts. HITL is not available by default and needs to be enabled for you by our support team before it can be used. If you would like additional information on HITL, please contact the Scribe team (opens in a new tab) for further details.
Before developing for HITL it is advisable to get familiar with how to develop against the Batch API for AI transcription since they are very similar. The key differences between AI and HITL are the transcription options, the output formats, and the time to completion. Since HITL does require human review, the turnaround times will be considerably longer than with AI but within contractual obligations. Additional HITL API specifics, from the API specifications, include:
- The
transcriber
option must be set tohuman
when submitting the transcription request - The
hotwords
option will be ignored since we are using humans to correct transcriptions - The
media_language
option can be provided to perform translation before transcription if your contract allows it. The output will always be in English but the input could be one the ISO-639-1 or ISO-639-3 language codes that we accept. If you specify an option other than English (en
oreng
) and your contract does not allow for translations then your requests will ultimately fail; if you specify the incorrect code (ie -fr
instead ofes
) then there could be a delay while we find the correct translator. - The
priority
option can be provided to alter the turnaround time based on information in your contract. The exact mapping between the values and the turn around times will vary and the Scribe team can help answer exactly what the mapping is. For example, if you are in a contract with 8, 12 and 24 hour turnaround times then we'd expecthigh
,medium
andlow
priorities respectively; if you are in a contract with just 8 and 12 hour turnaround then we'd expecthigh
andmedium
priorities respectively. - The
context
option can be provided to inform the human transcription team additional 'context' about the audio including the speakers, their bio, dates, etc. - You will only be able to retrieve transcripts as
.docx
Word documents. That means you will always need to set the correctAccept
header whenGET
ing the results. For HITL, theAccept
header should always be set toapplication/vnd.openxmlformats-officedocument.wordprocessingml.document
when retrieving the final transcripts.
Best Practices
Webhooks
Due to the time it takes to return a HITL transcription it is advisable to make use of webhooks (over polling) to know when transcriptions are complete. Setting up HITL for webhook notification, and the final notification of completion, is exactly the same as it is for AI transcriptions. You can certainly poll for completion, but you might want to reduce the frequency of polls to once every 10 to 15 minutes.
Additional Context
There may be additional information / context available about the call that you would like to pass
on to the human team doing the transcription. This could include a reference or call ID that you
know about internally, a summary of the audio, information about the speakers on the audio call, etc.
This information can help our transcription team provide higher quality transcripts, faster. This
additional context is passed with the context
option when the transcription job is initially
submitted.
The additional context can be provided in any fashion, from a string to a dictionary of values, but a dictionary of key and values will be preferred. There are no fixed key values for the context dictionary, but the following are recommended:
id
- An ID for your own reference, quite possibly one in your system that you are tracking. This could be a UUID, a short string or something else that you use to reference transcriptions internally.title
- A short name for the transcriptiondescription
- A description of the transcription, possibly with speakers, speaker biography or agenda.
Starting a New Transcription
If the audio / video data to transcribe is 'local' to where your code runs (easily accessible, on the same file system, on an internal object storage network, etc) then Scribe will accept the audio / video during the transcription request.
# Assuming a local mp3 file named 'sample.mp3' and ACCESS_TOKEN environment variable is set.
curl -XPOST \
-F "media=@sample.mp3;type=application/mp3" \
-F 'options={"transcriber": "human", "priority": "medium"};type=application/json' \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
https://scribe.kensho.com/api/v2/transcription
If the file exists and your access token is valid, you should get a transcription id like:
{
"transcription_id": "502f2a9dcb69454582b23309fc6609b5"
}
which you can use later to check if the transcription is complete or to retrieve the final transcript.
Transcribing with Remote Data
If the audio / video data to transcribe is 'remote' (stored in a separate cloud service, etc) and:
- that data can be retrieved through a RESTful
GET
call - no header-based authentication is needed to retrieve the data (ie - no
Authorization
header, etc); mechanisms like presigned URLs are, however, fine.
then Scribe will accept a URL where it will attempt to retrieve the audio / video file to transcribe for the request.
# Make sure to replace the 'Media URL' where the data lives
# This assumes an ACCESS_TOKEN environment variable is set.
curl -XPOST \
-d '{"media_url": <Media URL>, "transcriber": "human", "priority": "medium"}' \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
https://scribe.kensho.com/api/v2/transcription
If the file exists and your access token is valid, you should get a transcription id like:
{
"transcription_id": "502f2a9dcb69454582b23309fc6609b5"
}
which you can use later to check if the transcription is complete or to retrieve the final transcript.
Retrieving the Final Transcript
Retrieving the final transcript is performed with a GET
request to the
https://scribe.kensho.com/api/v2/transcription/{transcription_id}
endpoint.
You will need to set the Accept
header to
application/vnd.openxmlformats-officedocument.wordprocessingml.document
to retrieve the resulting word
document transcript.
# Make sure to replace the 'Transcription Id' with the id obtained after submitting the transcription.
# This assumes an ACCESS_TOKEN environment variable is set.
# This will result in a 'sample.docx' Word document on success.
curl \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Accept: application/vnd.openxmlformats-officedocument.wordprocessingml.document" \
-o sample.docx \
https://scribe.kensho.com/api/v2/transcription/<Transcription Id>