Azure Data Manager for Energy indexing and search workflows
Article
All data and associated metadata ingested into the platform are indexed to enable search. The metadata is accessible to ensure awareness even when the data isn't available.
Indexer Service
The Indexer Service provides a mechanism for indexing documents that contain structured and unstructured data.
Note
This service is not a public service and only meant to be called internally by other core platform services.
Indexing workflow
The below diagram illustrates the Indexing workflow:
When a customer loads data into the platform, the associated metadata is ingested using the Storage service. The Storage service provides a set of APIs to manage the entire metadata lifecycle such as ingestion (persistence), modification, deletion, versioning, retrieval, and data schema management. Each storage metadata record created by the Storage service contains a kind parameter that refers to an underlying schema. This schema determines the attributes that will be indexed by the Indexer service.
When the Storage service creates a metadata record, it raises a recordChangedMessages event that is collected in the Azure Service Bus (message queue). The Indexer queue service pulls the message from the Azure Service Bus, performs basic validation and sends it over to the Indexer service. If there are any failures in sending the messages to the Indexer service, the Indexer queue service retries sending the message up to a maximum allowed configurable retry count. If the retry attempts fail, a negative acknowledgment is sent to the Azure Service Bus, which then archives the message.
When the recordChangedMessages event is received by the Indexer Service, it fetches the required schemas from the schema cache or through the Schema service APIs. The Indexer Service then creates a new index within Elasticsearch (if not already present), and then sends a bulk query to create or update the records as needed. If the response from Elasticsearch is a failure response of type service unavailable or request timed out, then the Indexer Service creates recordChangedMessages for these failed record IDs and puts the message in the Azure Service Bus. These messages will again be pulled by the Indexer Queue service and will follow the same flow as before.
Search service provides a mechanism for discovering indexed metadata documents. The Search API supports full-text search on string fields, range queries on date, numeric, or string field, etc. along with geo-spatial searches.
When metadata records are loaded onto the Platform using Storage service, we can configure permissions for viewers and owners of the metadata records under the acl field. The viewers and owners are assigned via groups as defined in the Entitlement service. When performing a search as a user, the matched metadata records will only show up for users who are assigned to the Group.
Reindex API allows users to reindex a kind without reingesting the records via storage API. For detailed information, refer to
Reindex OSDU® documentation
Demonstrate understanding of common data engineering tasks to implement and manage data engineering workloads on Microsoft Azure, using a number of Azure services.
Learn how to work with well data records in your Azure Data Manager for Energy instance by using Wellbore Domain Data Management Services (DDMS) APIs in Postman.
Learn how to work with well data records in your Azure Data Manager for Energy instance by using Well Delivery Domain Data Management Services (DDMS) APIs in Postman.
This article provides an overview of the OSDU services available on Azure Data Manager for Energy and the OSDU services that are exclusively available in the community version.