V3 (latest)
Overview

Kensho Extract

The Kensho Extract API allows users to transform PDF documents into structured JSON files. There are two model output types. Please read the following information on both model outputs to determine which is the most optimal for your use case:

hierarchical: The hierarchical model will provide the specific document structure, mimicking the intended hierarchy of a document.

  • Titles, subtitles & Footers
  • Paragraphs
  • Tables & Table Titles
  • Figures & Figure Titles
  • Miscellaneous Text

general: Choosing the general model will provide a non-hierarchical structure of the document. We recommend trying the hierarchical first, and if you are not satisfied with the output or do not require a hierarchical structure, try the general model!

  • Text
  • Tables
  • Figures
  • Titles

API V3 Feature Updates

  • Enhanced Table Extraction: Not only does it improve recognition of rows and columns, it also provides best-in-class support for challenging elements like merged cells and column headers. Users should select “enhanced_table_extraction” as “true” to use our latest model to extract tables from within their documents. Please note: This feature should not be selected “true” when users have scanned documents
  • Optical Character Recognition (OCR): An exciting new capability for Kensho Extract is to offer OCR on scanned documents! As a beta release, OCR is currently undergoing testing and refinement to ensure accuracy and reliability in various scenarios.

Get Started

You can begin using Kensho Extract in seconds via our REST API.

The API behaves in the following fashion:

  • After authentication, the user is able to submit PDF documents as well as a priority code to the API. By default, the API will treat all documents as first in, first out with the exception that any document marked as low priority will be handled after high priority documents are completed regardless of when they are submitted.

  • The low priority queue is intended for all bulk document processing to avoid delaying the processing of any high-urgency documents which may need a fast turnaround.

  • After document submission, the API will return a unique request_id which can be used for a subsequent query to retrieve the document output at a later time.

To sign up, please email support@kensho.com to set up your API profile.

Then, to start extracting documents with Kensho Extract, visit our authentication guide or reference the full API Documentation.