Use-case for large ingested data in near real-time

2JK 241 Reputation points
2021-09-15T11:43:17.187+00:00

Hi.

I have a use-case where I'm expecting more than 2GB of data to be ingested per day, where I then have to interpret, process and store the data, and connect the data store to Power BI. The data will be streamed in near real-time, and random peaks should be expected. Historical data up to a point have to be persisted.

I have a few questions on what services will best fit my use-case. Please note that I'm still fairly new to all of this and have no experience in many of the services I'm listing, so please correct me if I'm wrong anywhere.

  1. Data store: 2GB of data per day and having to persist historical data for a year (or more) should mean Blob storage, right? We separate the data into access tiers and configure lifecycle policies to move older data into lower tiers. However, I want to have a DirectQuery connection to Power BI and Blob storage doesn't offer that. So, that's off the table.
  2. Synapse: I feel like this would be the best option here. The data I'm expecting comes from their own storage and is streamed near real-time into our Azure environment. I'm not sure how exactly they plan to send the data yet but Synapse should support many linked services so I don't think it will be a big issue. When we ingest the data, we have to process it and store in a data store and connect to Power BI for dashboarding. From my research, Synapse persists your data in storage, right? So, it can ingest, store, process and output your data? Would that be enough for my entire use-case?
  3. Queues: If the data is being streamed near real-time in large amounts (at least 2GBs per day; could reach double), would that require I set up a queue like Service Bus before Synapse, for example? Or can Synapse handle very large data?
  4. Assuming we go for Synapse, the historical data still be archived in Blob storage, correct?

What would you recommend? Cost should not be an issue. Any help on this would be appreciated.

Thanks.

Azure Service Bus
Azure Service Bus
An Azure service that provides cloud messaging as a service and hybrid integration.
548 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,436 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,395 questions
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 37,896 Reputation points Microsoft Employee
    2021-09-16T06:25:06.753+00:00

    Hi @2JK ,

    Thank you for posting query in Microsoft Q&A Platform.

    Yes, as you are guessing "Azure Synapse Analytics" is best option to go with.

    Q. I want to have a DirectQuery connection to Power BI.
    A. This is very much possible with Synapse. We can connect a Power BI workspace to an Azure Synapse Analytics workspace to create new Power BI reports and datasets from Synapse Studio. Click here to know more about it

    Q. Synapse should support many linked services.
    A. yes, Synapse supports many connectors. To know the full list of connectors Click here.

    Q. Synapse can ingest, store, process and output your data?
    A. Yes, Synapse can be used to ingest, store and process and output your data.
    Synapse Pipelines - can be used to ingest, output data.
    Synapse SQL & Synapse Spark - can be used to process or transform your bigdata at scale.
    Dataflows - can be used to process or transform your data without even need to write code. You can implement your transformation logic in GUI fashion.

    Q. Synapse handle very large data?
    A. Yes, Synapse can handle huge data.

    Q. Assuming we go for Synapse, the historical data still be archived in Blob storage, correct?
    A. yes, Its very much possible. You can have a Synapse Pipeline to archive your historical data to Blob storage.

    Please Note, you can use Azure Stream analytics also to stream data in to Azure Synapse. Click here to know more.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful