best approach to get telemetry data for 'in real time' analysis in Power BI and create historical data at the same time

Gustavo Adolfo Alemán Sánchez 40 Reputation points
2024-02-14T07:08:15.71+00:00

Hi, my plan is to collect telemetry data from a KQL DB stored in ADX, currently the service is retaining the data for only 30 days and my plan is to create a long retention for at least 2 years and at the same time have dashboards of the current telemetry data. The plan is to have low complexity and low latency architecture

MY plan is to use event hub to connect directly to ADX and move the data to ASA jobs to aggregate the data and store in ADLS as AVRO files for historical retention, write for daily upserts and then proceed to incremental loads into a SQL dedicated pool in synapse, maybe some spark notebooks to create a stagging tables before merge into dedicated tables and be able to set up properly idempotent pipelines, following a SCD type 1

Then create two connections to Power BI, one for historical analysis in Synapse, and other one for ASA jobs to get daily updates, for what I know it is possible https://learn.microsoft.com/en-us/azure/stream-analytics/power-bi-output just not sure if this approach would work smoothly or if is a better solution to accomplish low latency and complexity

Thanks in advance :)

Azure Data Explorer
Azure Data Explorer
An Azure data analytics service for real-time analysis on large volumes of data streaming from sources including applications, websites, and internet of things devices.
487 questions
Azure Event Hubs
Azure Event Hubs
An Azure real-time data ingestion service.
564 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,438 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 16,071 Reputation points
    2024-02-14T11:42:25.6533333+00:00

    Your Current Plan Overview

    The plan is to pipe the data from ADX to ASA jobs on Azure Event Hubs for aggregation, then persisting it as an AVRO file in ADLS for historical retention. The question of using Azure Synapse Analytics for daily Upserts and incremental loading from SQL dedicated pool into the staging tables that will be merged into the dedicated tables by Spark notebooks before building idempotent pipelines and using the type 1 Slowly Changing Dimension (SCD) approach is under consideration.

    You plan to create two connections to Power BI: Snapchat history on Synapse and another for daily updates from ASA jobs.

    What do I recommend ?

    Your data plane contains multiple steps and services as well as latency and complexity issues. Think about removing steps or integrating processes where it makes sense. For instance, Azure Synapse comes with native integration with both ADLS and Power BI, the possibility of simplified architecture by cutting dependencies on ASA for aggregation may come true, provided Synapse can process aggregating workloads directly.

    Azure SQL Synapse Link (for Azure Cosmos DB and other services) provides futuristic analytical data streams without disrupting your transactional workloads. The approach of using "link" services to cut down latency and complexity in data pipelines could be implemented only if the integrations in a similar way exist for the services you do use now or are planned for the near future.

    Make sure that the chosen architecture is scalable enough to fit with your data and performance requirements. An example could be to avoid using Spark pools in Synapse for simple transformations provided by ASA jobs or SQL pool stored procedures that might be more resource-intensive than necessary.

    If you need real-time dashboards, favor Power BI’s DirectQuery mode for the ASA job outputs since it will give you the ability to query the data directly instead of importing it into Power BI. It is possible to reduce latency for the dashboard updates, although it is highly recommended to assess the effect on query performance and expenses. For historical data analysis, the Import mode may be a more suitable option, especially when the data is large and the updates are not often.

    With Azure Monitor and other tools for performance monitoring, monitor your data pipeline and Power BI dashboards all the time. Seek out data flow bottlenecks or inefficiencies, query performance issues, and resource wasting. Periodically audit and modify your data models, queries, and pipeline structures to satisfy timeliness and complexity.

    2 people found this answer helpful.

0 additional answers

Sort by: Most helpful