environment design

Question

environment design

arkiboys 9,711

As you see below, we are trying to cater for realtime data and put them in ADLS Gen2 perhaps.

Requirements:
1- To Capture real-time transactions data from external providers and store into azure (Possibly ADLS Gen2)...
2- There are thousand (Not millions) of rows coming in
3- There are a-lot of columns in each feed. Some may have over 100 columns
4- Not to have negative impact on performance
5- The captured data in azure to make it available to all systems, i.e. Power BI, python/.net apps, etc.
6- The data received are in different file formats or from various data sources...
7- They have a lot of columns and perhaps with thousands of rows...
8- The frequency of arrivals are regular and various times...
9- may need to scrape from websites to pull data through

Questions:

shoud we :
use synapse workspace?
store the captured data in .parquet?
use serverless or dedicated sql pool?
azure sql server?
perhaps event hubs to capture realtime data?
Databricks or synapse notebook?
etc.

Thank you

0 comments

Answer accepted by question author

0 additional answers

Your answer

Answer 1

Hello @arkiboys ,

Thanks for the ask and using Microsoft Q&A platform .

If I understand your ask correctly, you are looking for services to use take data from external sources and load it to ADLS and then serve data from there to downstream apps such as PBI, python or .net apps.

Azure Synapse pipelines/Azure data factory have connectors for n number of sources so I would recommend use them to load data in to ADLS from your sources system. Click here to know all connectors in ADF & Synapse. If your source systems or like sensors or IoT and event streams then you can consider Event Hub.
Storing captured data in ADLS in .parquet format helps better as Parquet is used to efficiently store large data and good for Big data analytics.
If you wish to take data from ADLS files directly and then serverless SQL pool itself can do the job with the capability of external tables. If you are looking for a provisioned SQL resources with its own storage and resources then you should consider dedicated SQL pool.
As most of resources come under synapse umbrella using Synapse notebooks itself will be give better manageability. I feel Azure data bricks is not needed as Synapse notebooks can do that job internally.

As your need data movement and transformation from external sources to Azure and then serve data back to PBI or applications, above solution best works.

Please do let me know how it goes.

---------------------------------------------------------------------------------

Please consider hitting Accept Answer. Accepted answers helps community as well.

arkiboys 9,711 Reputation points

2022-02-17T07:20:36.627+00:00

Hi, Thanks for the message...
Is there a web link which promotes the use of synapse workspace for my requirements?
Thanks
ShaikMaheer-MSFT 38,631 Reputation points Microsoft Employee Moderator

2022-02-17T15:11:54.587+00:00

Hi @arkiboys ,

Kindly check below documentation if that helps.
https://learn.microsoft.com/en-us/azure/architecture/example-scenario/dataplate2e/data-platform-end-to-end?tabs=portal

-------
Please consider hitting Accept Answer button. Accepted answers helps community as well.

Share via

environment design

0 additional answers

Your answer