logo
logo

Amazon's optical character recognition toy Textract is here but still a bit short-sighted

avatar
Michael Wadsworth
img

Auto-detects structured data... some of the time

Optical character recognition (OCR) is a mature technology built into many applications.

Insert a scanned document into Microsoft's OneNote, for example, and you can "copy text from picture" with reasonable results.

Using the API, you can programmatically convert documents in bulk operations or as part of a workflow.

You can use it in conjunction with other AWS services such as Amazon Translate or Amazon Comprehend (a machine learning service to find "insights and relationships in text", according to Amazon).

Another relevant service is Elasticsearch, a deployment of Elastic's open-source search engine which lets you search and analyse text.

collect
0
avatar
Michael Wadsworth
guide
Zupyak is a free content platform for publishing and discovering stories, software and startups.