Overview
AI-powered information extraction and analysis enables organizations to gain actionable insights from data that might otherwise be locked up in documents, images, audio files, or other assets. Insights can come from structured and unstructured content. Structured content is information stored in a consistent format. Some examples include invoices, tax forms, and tables. Unstructured content is information that isn't in a predefined format. Some examples include emails, audio recordings, images, and videos.
Information extraction processes
In general, information extraction processes follow these steps:
Step | Description |
---|---|
Source Identification | Determine where the information resides and if it needs to be digitized. |
Extraction | Leverages many techniques based on machine learning to understand and extract data from digitized content. |
Transformation & Structuring | Extracted data is transformed into structured formats like JSON or tables. |
Storage & Integration | The processed data is then stored in databases, data lakes, or analytics platforms for further use. |
Both the type of content and type of insights needed from that content inform which techniques are necessary for information extraction. In this module we will take a look at the extraction of information with AI:
- From images
- From forms
- From multiple modalities
- For knowledge mining
In many ways, the techniques used for images, forms, multiple modalities, and knowledge mining build upon each other.