Overview

Completed

AI-powered information extraction and analysis enables organizations to gain actionable insights from data that might otherwise be locked up in documents, images, audio files, or other assets. Insights can come from structured and unstructured content. Structured content is information stored in a consistent format. Some examples include invoices, tax forms, and tables. Unstructured content is information that isn't in a predefined format. Some examples include emails, audio recordings, images, and videos.

Information extraction processes

In general, information extraction processes follow these steps:

Step Description
Source Identification Determine where the information resides and if it needs to be digitized.
Extraction Leverages many techniques based on machine learning to understand and extract data from digitized content.
Transformation & Structuring Extracted data is transformed into structured formats like JSON or tables.
Storage & Integration The processed data is then stored in databases, data lakes, or analytics platforms for further use.

Both the type of content and type of insights needed from that content inform which techniques are necessary for information extraction. In this module we will take a look at the extraction of information with AI:

  • From images
  • From forms
  • From multiple modalities
  • For knowledge mining

In many ways, the techniques used for images, forms, multiple modalities, and knowledge mining build upon each other.