Data import

In the world of sustainability accounting and reporting, data transformation and import play crucial roles in ensuring accurate, reliable, and comprehensive sustainability insights. However, organizations often face technical and business challenges in effectively managing and harnessing sustainability data for reporting purposes.

Some of the key challenges include:

  • Data granularity and complexity: Managing diverse sustainability data from multiple sources and categories is essential for creating a comprehensive and granular picture of environmental efficiency progress, encompassing carbon, water, and waste efficiency.

  • Data quality and consistency: Accurate, reliable, and consistent data throughout an organization’s sustainability journey is important for accurate reporting and compliance.

  • Scalability and performance: Handling and processing large datasets efficiently while maintaining optimal performance is crucial for organizations in the near and long term.

  • Data availability: Data can come in different formats or require to be captured manually.

In addition, businesses face increasing pressure from stakeholders, investors, customers, and regulatory bodies to disclose comprehensive sustainability information. Meeting these expectations requires efficient data transformation and ingestion processes.

This section focuses on the latest enhancement and capabilities of the Ingestion process that builds on the existing capabilities of Microsoft Sustainability Manager. The new capabilities provide a streamlined and advanced platform to manage the data transformation and ingestion process. It provides a simplified user interface, multiple methods of data ingestion, templates, and mapping capabilities.

Data import user experience

A simplified and guided user interface is a much-needed capability for organizations to successfully perform the data transformation and ingestion of their environmental data.

The Sustainability Manager data ingestion experience keeps the data model at the forefront during the ingestion experience to limit user errors. Users can take advantage of the flexibility and multiple ingestion methods based on their requirements and design considerations. In addition, data can be ingested across multiple entities and categories through a one-time import using the same ingestion. These ingestion mechanisms are standardized across carbon, water, and waste.

An important aspect of the data ingestion process is the decoupling of data transformation, data mapping, and data import. This decoupling also helps in mitigating the risks of human errors in extracting, transforming, and loading the data.

During the data connection process, organization users can utilize the data transformation tools sidebar to:

  • Configure datasets to be imported.

  • Use checklists to validate datasets. Checklists ensure successful data import and address issues related to mandatory data columns, data type mismatches, populating option set values, and looking up reference data.

Built-in ingestion methods and design considerations

Sustainability Manager provides four methods for ingestion, depending on:

  • Data type
  • Source
  • Data transformation needs
  • User experience level
  • Import frequency

You can use these different methods with the following considerations.

Method and description Design considerations Use when
Excel templates are built into the Microsoft Sustainability Manager and users can download, prepare their data, and import to specific entities. Excel templates are refreshed on a periodic basis. We recommend that you download the latest Excel templates during each import.

As the Excel templates follow the data model naming convention, they don't require data mapping to the model.

Excel templates represent the simplest form of data import. They're suitable for smaller set of data that an Excel file can support and for initial system preparation.

Excel templates contain predefined set of entities to enable data import.
Excel templates are best used when the data preparation/transformation is done outside Sustainability Manager and the data is ready for a direct import.

Example: Contoso coffee, a Coffee manufacturer can use the Excel templates to set up their facilities using the Reference data template, then proceed to set up their Carbon activity and precalculated emissions data using the Activity and Emissions data templates. Then, they can do a one-item import of all three types of data with a single ingestion.
Power Query guided experience is a data-mapping centric experience, guiding the end users with a shopping cart experience to minimize ingestion errors. Power Query provides a comprehensive approach to data transformation and ingestion, making it efficient for handling various data sources.

Power Query is beneficial for data sources that have built-in connectors available in Microsoft Sustainability Manager, streamlining the data import process.

The Power Query guided experience provides flexibility and detailed guidance throughout the ingestion process, allowing users to add and remove entities as needed.
Use Power Query when you need to:

- import data for multiple entities from various sources.
- have built-in Power Query connectors available in Microsoft Sustainability Manager.
- require frequent data import and refresh
- need data transformation during the ingestion process.
- prefer guided experience with flexibility for entity manipulation and data mapping.
- different source types need to be handled with data transformation before mapping to the data model.

Example: Contoso Coffee, a Coffee manufacturer, wants to import the emissions data from their supply chain ERP and supplier provided data in Azure Synapse.

Contoso can use the Power Query guided experience to transform and map using the detailed guidance provided in the editor.
Partner Connectors – are a catalog of connectors to import data from partner solutions. Sustainability Manager includes connectors to import data from partner solutions.

The platform offers a catalog of prebuilt and custom connectors from operational data providers.

An important aspect of importing data using the partner connectors is that data mapping and transformation are encapsulated within the connector. They don't have to be performed during the import process.
Use when you need to import data from a partner solution and the corresponding connector is present in the catalog.

Example: Contoso Coffee uses Arcadia's utility cloud solution to capture purchased electricity and natural gas data for its facilities.

They utilize the Arcadia connector to import purchased electricity data into Sustainability Manager. This connector allows them to calculate their scope 2 emissions for all their global facilities.
Manual import of Data into Sustainability Manager Microsoft Sustainability Manager provides two methods to manually import data.

The first method allows entry of individual records, useful for importing Configuration data such as Reference data.

The second method allows bulk import of activity records and reference data records.

For scope 3, category 8 and category 13, you need to select the data subcategory to access the bulk upload options.

When updating activity data records, you must delete previously imported data and reimport all the data.
Consider using the manual data import options in Sustainability Manager when you need to import individual records of activity, water, or waste data.

Additionally, utilize these options when you want to import reference data or when you have a large volume of activity data to import in bulk.

If you have specific data formats for bulk import, such as Excel, CSV, or XML, the manual bulk upload option is the ideal choice.

For example, a sustainability specialist setting up organization data goes to Settings > Company profile > General to manually define the attributes of the organization.

The following table summarizes the design considerations for organizations while designing their data import process. We recommend that you carefully consider the pros and cons of each ingestion method and their design tradeoffs.

Ingestion method Excel templates Power Query guided experience Partner connectors Manual entry
Typical user personas Business Personas Business/Technical Personas Technical Personas Business Personas
Can be scheduled?
Data load volume Bulk Bulk Bulk Single, bulk
Allows data transformation?
Provides incremental refresh?
Data overrides during import?
Can import multiple entities? *
Provides ingestion error handling? NA

Note

Importing multiple entities using the partner connectors in a single ingestion is in the product roadmap.

To learn more, review Overview of data import in Microsoft Sustainability Manager.

In summary, the following considerations are important while importing data into Microsoft Sustainability Manager:

  • Start with the simpler Excel template method for ingestion experience. Based on the data complexity and understanding of the data model, evolve to create continuous data imports using the Power Query guided experience.

  • During the Power Query guided ingestion experience, keep the following tips in mind:

    • Use the Power Query capabilities to perform the data transformations.
    • Use the data transformation tools and checklist to review the data tables and their respective data types.
    • While mapping the source data, use the automap feature.
  • Schedule the refresh time slots and rules to enable the continuous import of required data once a connection is created and validated.

Full and incremental data refresh

Periodic import of data from data sources is a necessity for organizations to calculate their most recent sustainability data for reporting purposes. Organizations can get data from their source systems in two modes.

Full data refresh: Import a full copy of source system data into the Microsoft sustainability Manager application.

Incremental data refresh: Fetch and import only the updated or newly created records from the source systems since the last refresh.

Incremental data refresh allows organizations to bring in new data when it’s available for each data refresh. Organizations can also opt for full refresh if data needs to be fully refreshed during each query. This capability is only available for Power Query based import.

To take advantage of this built-in capability, create refresh policies while creating the connection for data sources. Microsoft Sustainability Manager provides this capability for Power Query based import method. During the process of creating a connection, users can define schedule and frequency to run the data import jobs and select the option to remove previously imported data if needed. This action gives the ability to schedule an incremental refresh.

Full data refresh is recommended for the following situations:

  • When the source systems data model undergoes any changes, the changes need to be reflected in the application for data transformations or mapping.

  • When the source data has changed significantly, and the incremental refresh isn't sufficient to capture the changes.

  • When the source system’s business logic has been modified and the existing data needs to be recalculated.

  • When the data quality or the data integrity has been compromised due to errors, corruption, or malicious attacks.

Incremental data refresh is recommended for the following situations:

  • When organizations require bulk import of data from multiple data sources

  • When organizations require timely refresh of data based on specific timelines

  • When the source systems provide historical, cumulative data and organizations need to import newly available data (insert or upsert) after the initial data load

For example, if Sustainability Manager is ingesting data from Excel files in a SharePoint document library, updated by the suppliers, it requires a refresh policy to consume the most recent data on a periodic basis.

While the scheduling at the data connection level provides a way of designing data import schedule. It doesn't solve the problem of fetching only the delta data from various source systems. With the incremental refresh process, data admins can configure granular rules for fetching the incremental data only. The process can provide extra filters, rules, and time stamps at each query level in a data connection.

To learn more about incremental refresh, go to Incremental data refresh in Microsoft Sustainability Manager

Network connectivity to data sources

During the design of data import strategies, organizations often face the challenge of securely and efficiently connecting to remote data sources. These remote data sources could be in the cloud or on-premises. Different security standards and measures that organizations use to protect their data assets further compounds this challenge. Thus, the networking components necessitate careful consideration of various options to ensure data security and reliability.

This section addresses two methods of connecting to remote data sources. These data sources could exist in Azure Cloud within a virtual network, or they could exist in an on-premises location. To connect from Sustainability Manager to Azure Data Services within a virtual network, organizations can use the virtual network data gateway. To connect to on-premises data sources, you can use the on-premises data gateway.

The following table constructs some of the important networking topics and design considerations while importing data from a remote source located in an on-premises infrastructure:

Topic Design consideration
Virtual network data gateway Use a virtual network data gateway to connect to data sources located in Azure and secured by a virtual network. This approach can be useful when the data source isn't publicly accessible.

To learn how to configure the virtual network data gateway, go to Use virtual network data gateway and data sources
On-premises data gateway Use on-pPremises gateways as a secure bridge between the on-premises sustainability data sources and Sustainability Manager. You need to download, install, and configure the on-premises data gateway.

To learn more, go to What is an on-premises data gateway?
Data source authentication Prioritize strong authentication mechanism for accessing sustainability data. Avoid using weak credentials and consider using Windows or OAuth-based authentication for enhanced security.
Add IPs to allowlist from outbound on-premises data source Ensure the allowlist of IP addresses to enable outbound communication between on-premises resources and Sustainability Manager.

To review IP address ranges to allowlist, go to Managed connectors outbound IP addresses
Performance Optimize queries to minimize data transferred over the network. Optimized queries reduce latency, increase performance, and conserve network resources.
Monitoring and auditing Prevent security breaches by implementing monitoring and logging for gateways and conducting periodic reviews with the organization’s security stakeholders.

To learn more about collecting logs from the data gateway, go to Troubleshoot the on-premises data gateway

In summary, making secure and reliable networking design choices is critical to ensuring a robust import mechanism for remote data sources. These include performing the necessary steps to:

  1. Install gateways for remote on-premises data sources and virtual network secured Azure hosted data sources.
  2. Using strong authentication mechanisms.
  3. Add to the allowlist the outbound IP addresses for enabling communication from on-premises data locations.
  4. Minimize data transferred over the network.
  5. Perform regular security audits.

Error handling

Error handling during the data imports plays a vital role to make sure your data is imported properly and dashboards are showing the latest information for Sustainability reports & analytics. Sustainability Manager lets users review and fix errors in the data imports. These error reports are now available for all three import types, templates, Power Query guided flow, and partner experience.

For more information, go to Error handling for data import in Microsoft Sustainability Manager

Next steps