Figure Extraction
Welcome to the release of our Figure Extraction feature! This functionality allows users to extract data from charts and figures in PDF documents.
Note
Figure Extraction is only available to call through our API. We will be adding the capability to call through our User Interface in the coming weeks
Key Highlights
-
Extract raw data from figures effortlessly.
-
In our first release, we focus specifically on data extraction from bar charts. Vertical, horizontal, grouped, and stacked bar charts are all supported.
-
We recommending using the
document_type=hierarchical_v2
when using Figure Extraction. This gives the best performance in our testing.
Getting started
To apply Figure Extraction to your document, add "figure_extraction": "true"
to the data
dictionary in requests.post
:
response = requests.post(
api_url,
files=files,
data={
"document_type": "hierarchical_v2",
"figure_extraction": "true"
}
)
Important Notes
-
Please expect throughput and latency increases when using Figure Extraction.
-
Today we support bar charts. Support for additional chart types such as line plots, scatter plots, and pie charts is coming soon.
-
Accuracy is best for charts with the text of the chart embedded into the document (we call these "native" charts).
-
Feedback is encouraged to improve accuracy and expand the types of charts we support.
Feedback and Support
We value your feedback to enhance Figure Extraction's performance. If you encounter any issues or have suggestions, please reach out to extract@kensho.com.
Stay Updated
Keep an eye on our release notes for updates and improvements to Figure Extraction based on your feedback and usage.