Operational data estate

Introduction

In financial services application landscapes, efficient management of operational data is a cornerstone of success. Microsoft Cloud for Financial Services introduces tailored operational data models for specific processes like single customer view or onboarding applications.

The article delves into building an operational data estate in sync with your core banking and other systems of record.

Operational data estate reference architecture

Microsoft Cloud for Financial Services implementations starts with deploying the data model for the selected capabilities to Dataverse. For the data persistency scenario of Dataverse, the following diagram shows the ingest and transform, store, and model steps of the operational data reference architecture in Microsoft Cloud for Financial services.

A diagram showing the solution components of operational data estate reference architecture

Download a printable PDF of this solution architecture diagram.

The reference architecture includes three components: built-in data components, data integration components and optionally data extension components.

Built-in data components

Microsoft Cloud for Financial Services solution uses Dataverse to store and secure app data. The data in Dataverse is modeled based on the Common Data Model (CDM) data schema. The Common Data Model offers a set of standardized, extensible data structures. The solution deploys some ready-to-use foundational (that is, banking core data model) and capability-based data models using the CDM data schema. As mentioned in types of operation data, Dataverse stores the core four data categories (master, transactional, configuration and inferred) as standard tables.

Data integration components

Financial institutions might have various financial and customer information managed by different systems of records. As mentioned, using the Dataverse persistency option, you need to hydrate and feed data elements from these systems to Dataverse. You can perform ingest and transform operations on the required data elements using different asynchronous and batch processing technologies, including Azure Logic Apps, Azure Data Factory, Data Flow, and Azure Functions. We recommend that you start the design independent of the integration approach of your choice, by creating a data mapping between the source systems and the financial services data model.

Data extension components (optional)

Based on your requirements, you might need to extend the data model with more data elements. Dataverse provides you with the capability to extend the data model with new tables, fields and choices that are on top of the underlying data models of the Microsoft Cloud for Financial Services solution.

For some extension scenarios mentioned earlier, you might want to explore the option of utilizing virtual tables or elastic tables (preview) as a means of preventing data proliferation or cope with substantial data volumes such as interaction history and account transactions. However, it's essential to carefully evaluate the implications and considerations associated with implementing virtual table and elastic table in your design.

Data implementation journey

The data implementation journey involves reviewing the data’s current state and mapping it to the Microsoft Cloud for Financial Services relevant data models. The mapping rule clarifications and data cleansing take place well in advance of the actual data sync process. During data sync, the conversion process uses the data extracted from legacy applications, transforms it through mapping and conversion, and loads it into the Microsoft for Financial Services data models. The process further provides for integration testing and validation to confirm the consistency and integrity of the hydrated data.

The data implementation journey for Microsoft Cloud for Financial Services can be broken down into the following stages to ensure a seamless and effective adoption of the platform.

  • Discover & Analyze
  • Planning & Design
  • Implementation & Test
  • Deployment & Go-Live

A diagram showing the data implementation journey of operational data estate

Download a printable PDF of this diagram.

Here are the key stages of the data implementation journey:

Discover and analyze

Requirements gathering

Collaborate with stakeholders, subject matter experts, and end-users to understand the business goals, data needs, and processes within the domain. Document the requirements and ensure clarity on the information to be represented in the model. Make sure to include non-functional requirements such as specific regulatory compliance requirements and data protection laws. Also, include specific data security requirements to safeguard sensitive financial information and prevent unauthorized access.

Document existing data landscape

Document the existing data landscape by identifying each data sources, the data quality (correctness, cleanness and completeness), connectivity and data governance practices, to identify any data-related challenges and opportunities. Evaluate the features and capabilities of the data platforms to ensure they align with the organization's data requirements and long-term objectives. The key point for the data landscape is to create a holistic view of the data repositories, their relationships with other systems, conceptual data flow and ownership. This information helps to meet the integration requirements and provides a reference for the broader project team.

When you prepare your data landscape, you can use the following questions to guide you.

  • What are the key entities required to model the required processes onto Cloud for FSI solutions like Unified Customer Profile or Customer Onboarding? Examples include customers, financial holdings, employees, branches etc.
  • Which is the system of record for these data elements? Is it a single system or are these entities held in different data stores? Are there any data silos?
  • How do you manage your data and integrate it across your lines of business?
  • What are the privacy and security aspects of the data?

Data landscape for a typical financial institution can span across many systems of record as illustrated:

Entity-relationship diagram of life event and financial goals capability in unified customer/client profile application

Download a printable PDF of this diagram.

Walkthrough on Microsoft Cloud for Financial Service data models

Show-and-tell using the out-of-the-box user interface and its mapping to the underlying data models. The following diagram illustrates the mapping of the Investment Details tab onto the Banking core data model.

A diagram showing the financial service contextual data landscape.

Download a printable PDF of this diagram.

You can download the visio file including the Entity-Relationship Diagrams (ERD) of each first-party solution.

Planning and design

Data conversion strategy

A successful data conversion implementation from source systems to Microsoft Cloud for Financial Services typically requires a well-defined data conversion strategy. The strategy should document several essentials elements, such as:

  • Scope and objectives: Clearly defined goals and objectives for the data conversion process, including what data is converted, what isn't, and any specific targets or metrics to achieve.
  • Data assessment and profiling: Details on how source data is assessed and profiled, including data quality analysis, dependencies, and structure evaluation.
  • Data mapping and transformation: Information on how data is mapped from source systems to the target system, including transformation rules and logic. You can refer to the details of this element in the subsequent section.
  • Data cleansing and enrichment: Strategies and processes for cleaning and enriching data to ensure accuracy and completeness.
  • Integration approach: Explanation of the ETL or integration tools, methodologies, and processes to use in data extraction, transformation, and loading into the target system. After progressing with or completing the data mapping exercise, you can use the compiled information and decide on the right integration pattern for initial load and synchronizing changes from source to target. See the Cross-Industry Integration patterns article for information on integration patterns.
  • Data validation and quality assurance: Description of the applied validation processes, quality checks, and data reconciliation methods to ensure data accuracy and integrity.
  • Data security and compliance: Measures and protocols for data security, including access controls, encryption, and compliance with relevant industry and country/region regulations and standards.
  • Risk management: Identification of potential risks and mitigation strategies related to data conversion, such as data loss, downtime, or performance issues.
  • Testing and validation: A plan for testing the converted data to ensure it meets the required standards and is ready for production use.
  • Documentation and logging: Guidelines for documenting the entire data conversion process, including data dictionaries, transformation rules, migration logs, and any relevant metadata.
  • Resource and timeline planning: Allocation of resources (human and technical) and a detailed timeline for each phase of the data conversion project.
  • Contingency and rollback plan: Provides for handling unexpected issues or failures, including a contingency plan and procedures for rolling back to the previous state if necessary.
  • Monitoring and performance optimization: Strategies for monitoring data in the target system post-conversion and optimizing its performance.

Sizing environment

Data conversion tasks require scalability of the environment. We recommend using scalable infrastructure based on high-volume for initial load and for any potential delta loads. For more information, see the Well-Architected Framework Performance Efficiency - Design applications for scaling article.

Data modeling

Identify entities: Identify the main objects or entities required for the processes. These entities represent the fundamental concepts or things of interest within the organization. Do a fit/gap analysis whether these entities exist within the Microsoft Cloud for Financial Services Data Model.

Determine relationships: Establish relationships between the entities to show how they're related or connected to each other. Relationships can be one-to-one, one-to-many, or many-to-many. For example, a "contact" entity has a many-to-many relationship with a "financial holdings" entity.

Design conceptual data model: Designing a conceptual data model provides a simplified and abstract representation of the data entities and their associations, focusing more on the business concepts than the technical implementation details. Following is a sample conceptual data model developed for investment details tab in Banking Core Data Model.

Sample conceptual data model developed for investment details tab in Banking Core Data Model

Review and validate: Review the conceptual data model with stakeholders to ensure it accurately represents the business requirements and relationships.

Iterate and refine: Iterate and refine the conceptual data model as needed based on feedback and evolving business needs.

Identify attributes: For each table, identify and define the attributes or properties that describe the entity. Assess the reuse of existing fields provided within the data model and identify fields for extension.

Document the model: Document the data model thoroughly, including a data dictionary that defines each entity and attribute in detail. This documentation will serve as a reference for future data modeling efforts. Represent the data model visually using an Entity-Relationship Diagram. The following figure illustrates an Entity-Relationship diagram for the Unified Customer Profile - Base Profile capability.

Entity-Relationship diagram for the Unified Customer Profile - Base Profile capability

Data mapping

For data conversion, you need a data mapping between the source system’s data elements to each attribute in the identified data model. You can follow these steps to compile the information:

  • Gather information of each data source. The information includes connectivity to and from public cloud, extraction approach (flat files, csv, direct query) for each data source, volatility (frequency of change), and volume (size of data).
  • Identify the source format. You should gather information on every data element in the data source that's earmarked for mapping. The information to gather includes data formats and their completeness (whether any cleansing is necessary).
  • Document the transformation logic. You need to document any transformation rule you need to apply for the source data before ingestion. You need to include along with other checks like mandatory fields, default values, option-set values etc.

You can use Data Hydration for Microsoft Cloud for Financial Services information to initiate your data mapping documentation. You can also initiate your own data mapping document by getting an export of the data model elements using XRMToolbox community tool.

Data cleansing and enrichment

We recommend that the data analyst team begin the process of identifying, consolidating, deduplicating, scrubbing, and cleansing the data for correctness and completeness. The goals are to limit the number of transformations needed during conversion, to curate the data and ensure it meets the required quality standards, and to enrich it with more relevant information, where necessary. The more logic required to transform, the longer the data import takes to load.

Implementation and testing

Data conversion flow

After completion of planning and design activities, the implementation phase will include activities such as:

  • Setting up the environment for development and testing
  • Preparing a production-like migration data for development and testing
  • Creating and populating configuration tables to support bulk and initial load from the same script
  • Development of data transformation rules such as data cleansing and conversion scripts
  • Development of data validation scripts
  • Logging and monitoring the success of data loads

These activities are performed as part of the below data conversion flow, with each step of the flow described as follows:

A diagram representing data conversion flow

Download a printable PDF of this diagram.

  • Load reference data into master tables for data conversion. You need to provide reference data for lookups and option-set into table to use in conversion scripts. This data should be updated regularly when new lookups are introduced to avoid any failures because of referential integrity.

  • ETL / ELT tool of your selection extracts raw data into Data Lake Storage Gen2

  • ETL / ELT tool of your selection loads the raw data into SQL Database and/or Data Lake Storage Gen2 where the raw data are staged for processing.

  • Data cleansing and enrichment scripts are executed to remove unwanted data elements (cleanse) from and add missing or additionally needed data elements (enrich) to the raw data.

  • Data transformation scripts are executed to transform cleansed and enriched data to the target data format. These scripts are developed based on the information recorded in data mapping. We recommend that you transform the data to the same format as the target data model so that data validations can be executed before the load process.

These scripts need to be ordered based on dependencies. Follow the guidance on Data Hydration for Microsoft Cloud for Financial Services to load the tables in the given order.

  • Executing data validation scripts before starting the load process to avoid failures during the load process and ensure data integrity. The major risk that the approach helps in avoiding is possible data corruption caused by the conversion script. You can create a data validation script for each table and add the following checks:

    • Are all master lookups resolved for legacy master codes?
    • Are all options set values resolved for legacy values?
    • Is all the required mandatory information for business resolved?
  • Loading transformed data into Dataverse using API (ExecuteMultipleRequest Method). You can also use some third-party connectors for Dataverse to ease the steps for payload generation, parallel thread executions, retry mechanisms, and record the load result.

  • Log execution of each load action. Record success and failed records for logging and retry logic. You can create a table like the following to store package execution log information.

Column Name Description
SeqId Identity column
ExecutionInstanceGuid Current Package Execution Id
PackageName Current Package Name
StartTime Start time of package
EndTime End Time of package
PackageStatus Current status of the package: Running, Failed, or Success
InitialRowCount Initial Row Count
CreatedRowCount # of records created in Dataverse
UpdatedRowCount # of records updated in Dataverse
ErrorRowCount # of records failed during data conversion
Message Message (if any). Contains the error message if PackageStatus = Failed

You can create an error table to store error messages for each failed record.

  • Creating a reporting to monitor cycle of each data conversion and create alerts for failures over threshold.

  • The solution should use platform services for collaboration, performance, reliability, governance, and security: **

    • Microsoft Purview provides data discovery services, sensitive data classification, and governance insights across the data estate.
    • Azure DevOps offers continuous integration and continuous deployment (CI/CD) and other integrated version control features.
    • Azure Key Vault securely manages secrets, keys, and certificates.
    • Microsoft Entra ID provides single sign-on (SSO) for ETL/ELT tool users.
    • Azure Monitor collects and analyzes Azure resource telemetry. The service proactively identifies problems, and maximizes performance and reliability.
    • Power Platform Admin Center for Dataverse Analytics including API Call statistics

Initial load and delta load

A data conversion process includes an extraction of data from a source system, transformation of the data, and load into a destination system (in this case, Dataverse with Microsoft Cloud for Financial Services data model). You need an initial load with firstly reference data catalog imported, followed with the master, transactional data loaded to hydrate essentials tables for business process. A delta load usually refers to capturing the changes in the source system (creates, updates, deletes) that occurred since the initial load or the last delta load, and applying only those changes to the destination.

The most important aspect of developing a data conversion plan is to ensure data integrity. The major risk that the approach helps in avoiding is possible data corruption caused by the conversion script.

Bugs in data conversion script can easily compromise the data integrity, which would potentially render the entire system corrupt and unusable. The risk of data corruption is a serious one and needs to be extensively tested, especially on complex areas of the data conversion. The best way to ensure data integrity is to keep it simple.

Following are some important questions when considering delta loads as part of a data migration:

  • How are the changes in the source system captured and in what format?
  • Which system is the system of record for a given data element? Will there be any changes in Dataverse need to be synced to the legacy system such as data elements in document intelligence or onboarding process?
  • Are the data structures the same in the source and destination? Is there normalization or denormalization happening in the transformation process?
  • How many entities are you expected to handle deltas for? What is the complexity level of the data model for these entities?
  • What complications can arise if there are dependent child records to be created or removed before creating or deleting the parent? Such aspects are often complex for deltas.
  • Are there any data elements that can be modified in both the systems? If so, how should you handle conflicts if both the systems have modifications for the same record?

You can structure your data conversion scripts to include upsert operation to handle both initial load and delta load. Using the upsert operation allows you to structure your conversion script in such a way that the same script can handle both new records or changed records. You need the GUIDs from Dataverse for the upsert to update pre-existing records, otherwise, it creates new records. For the GUIDs, you can either pre-create your GUIDs in your code using a function like NEWID(), or you can capture the GUID created by Dataverse and store it back in your target staging table for that record.

Best practices for data conversion implementation

  • Use Bulk Operations API (preview) or Batch API to load the records in a set of 1000 in one go, instead of loading record one-by-one.
  • Send parallel requests – The best results are achieved by not more than two threads running concurrently when using Bulkupdate API. If more than two threads are used, the operation might return a server busy error and the throughput achieved can never be better than what can be achieved by using two threads.
  • Create a production-like data for development and testing. The data can be generated from the production system and must be masked, obfuscated or de-identified before use for development. System testing with converted /data, identifies as many problems in advance as possible. It also allows for testers, SMEs, and Sprint Teams to work with the real data, right away.
  • Attempt to establish parallel execution pipelines while taking into account the sequential dependencies among tables to maximize throughput.
  • Follow the best practices shared to optimize performance for bulk operations.
  • Consider the Service protection API limits (Dataverse) in your load design such as retry executions based on the API.
  • There is a set of characters that can't be saved in string or memo columns. See this article to implement the steps required to overcome the problem.

Deployment and go-live

After successful implementation and testing efforts, you can execute go-live stage by following cut-over activities and business validation.

Cutover and execution

The cutover time in which production deployment needs to be done is critical in the implementation strategy. If data migration time exceeds the cutover time, then you can begin a full load a few days before Go-Live and during cutover time plan you can execute the delta load only. Delta load requires design considerations for identifying changed and new records. Follow these best practices shared for Cutover plan and execution:

Business validation

During cutover activities, it's crucial to ensure that the business processes and data are validated before opening the application for use to other users. Here are some key business validations that need to be scoped:

  • Data integrity: Verify the accuracy and completeness of data conversion, ensuring that essential master and transactional records, such as customer information, financial product, and financial holdings are correctly transferred.

  • Process continuity: Validate that critical business processes, such as onboarding process, can be executed seamlessly in the new environment without disruptions.

  • Security and access control: Ensure that user roles and permissions are configured correctly to maintain data security and compliance with organizational policies. Confirm that only authorized personnel have access to sensitive information.

  • Integration points: Validate the integration with other systems, if applicable, to confirm that data flows smoothly between Dataverse and any other data sources, maintaining data consistency.

Best practices for data conversion go-live

  • Sequence: The data load sequence is important, and in some cases, records fail if not ordered in the correct sequence. Referenced records must exist if they appear as a lookup value in any record, thus they need to be identified and loaded in the correct order.

  • Bypass plug-in and Power Automate executions: All plug-ins/workflows/business rules should be disabled during initial data migration. Leaving plug-ins enabled affect not only the performance, but also the integrity of the data. However, for delta loads; you might not have the luxury to disable custom business logic and automations while system is being by the users. You can pass special optional parameters with your load API requests to bypass custom plug-ins and Power Automate, as detailed here. Using optional parameters can eliminate any risk of unintended data generation during delta load because of active plug-ins and Power Automate.

  • Restore point and rollback: There are multiple options for backing up and restoring (manual and system backups). On demand backups should be taken before starting initial load exercise. If there are any failures during initial load, the instance can be restored back to cut over the start snapshot.

See also

Next steps