V2
Overview

Kensho Extract

The Kensho Extract API allows users to transform PDF documents into structured JSON files. There are two sets of categories output by the model:

  • If you set document_type to broker_research (default), a hierarchical structure is returned, mimicking the intended hierarchy of a document:
    • Headers & Titles
    • Paragraphs
    • Tables & Table Titles
    • Figures & Figure Titles
    • Miscellaneous Text
  • If you set document_type to general, a non-hierarchical structure is returned:
    • Text
    • Tables
    • Figures
    • Titles

The JSON files support both structures and allow for potentially different category sets in the future.

The API behaves in the following fashion:

After authentication, the user is able to submit PDF documents as well as a priority code to the API. By default, the API will treat all documents as first in, first out with the exception that any document marked as low priority will be handled after high priority documents are completed regardless of when they are submitted.

The low priority queue is intended for all bulk document processing to avoid delaying the processing of any high-urgency documents which may need a fast turnaround.

After document submission, the API will return a unique request_id key which can be used for a subsequent query to retrieve the document output at a later time.

Get Started

You can begin using Kensho Extract in seconds via our REST API.

To sign up, please email support@kensho.com to set up your API profile.

Then, to start extracting documents with Kensho Extract, visit our authentication guide or reference the full API Documentation.