How to setup modern Arcitechure for Small/Medium Business?

Jordan 25 Reputation points

Currently we're using the following setup which is slow to process the data and is slow on the power bi side:

  1. Azure VM for third parties to upload via sftp
  2. C# script to ETL data to azure sql server and move files to ADLS Gen2
  3. Power BI report pulling directly from sql server

Every month we're ingesting roughly 20M rows and will increase as more customers are on boarded. I'm looking at the following setup:

  1. SFTP directly to ADLS Gen 2
  2. Daily ingest to Azure Databricks
  3. Create Dataflow in PBI and scheduled automatic refresh
  4. Create Semantic Model based on Dataflow
  5. Build Reports based on new semantic model

Is this an optimal flow or would there be a better setup?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,398 questions
Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
7,405 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,029 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 18,106 Reputation points

    I think whart you are proposing looks promising and more scalable than the current one. However, I am adding some points you can consider to ensure optimal performance, scalability, and maintainability :

    1. Ingestion:
      • SFTP to ADLS Gen2:
        • Directly ingest data from third parties to Azure Data Lake Storage (ADLS) Gen2.
        • Orchestrate the SFTP to ADLS Gen2 process using Azure Data Factory (ADF) or Azure Logic Apps. This provides better monitoring and error-handling capabilities.
    2. Data Processing and Transformation:
      • Azure Databricks:
        • Use Azure Databricks for scalable and efficient data processing.
        • Schedule daily jobs in Databricks to process and transform the ingested data.
        • Store processed data back into ADLS Gen2 in a structured format (e.g., Parquet).
    3. Data Storage:
      • Azure Data Lake Storage (ADLS) Gen2:
        • Use ADLS Gen2 for both raw and processed data storage.
        • Implement a data lakehouse architecture to store structured and unstructured data.
    4. Data Modeling and Analysis:
      • Power BI:
        • Create Power BI dataflows for initial data transformations and aggregations. This can simplify your ETL processes and make your datasets reusable.
        • Build a Semantic Model (Data Model) in Power BI based on Dataflows for better performance and ease of use.
        • Schedule automatic refreshes for Dataflows to ensure data is up-to-date.
    5. Data Governance and Security:
      • Implement proper data governance policies, including data lineage, data quality checks, and access controls.
      • Use Azure Active Directory (AAD) for managing user permissions and roles.
      • Enable logging and monitoring using Azure Monitor and Azure Log Analytics to track data processing and usage.

    Your flow diagram :

    1. Ingestion:
      • SFTP → ADF/Logic Apps → ADLS Gen2
    2. Processing:
      • ADLS Gen2 → Databricks → ADLS Gen2 (processed data)
    3. Analysis:
      • ADLS Gen2 (processed data) → Power BI Dataflows → Power BI Semantic Model → Power BI Reports
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Sumarigo-MSFT 44,586 Reputation points Microsoft Employee

    @Jordan Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    The setup you have proposed is a good start for a modern architecture for small/medium businesses. However, there are a few additional considerations you may want to take into account:

    Data ingestion: Consider using Azure Data Factory to orchestrate your data ingestion workflows. This can help you automate and manage your data pipelines more effectively, and can also provide more flexibility in terms of data sources and destinations.

    Data storage: Consider using Azure Synapse Analytics as your data warehouse solution. This can provide a more scalable and performant solution for storing and processing large volumes of data.

    Data processing: Consider using Azure Databricks for your data processing needs. This can provide a more scalable and efficient solution for processing large volumes of data, and can also integrate with other Azure services such as Azure Data Factory and Azure Synapse Analytics.

    Data visualization: Consider using Power BI for your data visualization needs. Power BI can provide a more flexible and customizable solution for creating reports and dashboards, and can also integrate with other Azure services such as Azure Synapse Analytics and Azure Databricks.

    Overall, the key to a modern architecture for small/medium businesses is to leverage cloud-based services that can provide scalability, flexibility, and performance. By using Azure services such as Azure Data Factory, Azure Synapse Analytics, Azure Databricks, and Power BI, you can create a more efficient and effective solution for your data processing and reporting needs.

    The proposed setup you've outlined is a modern and scalable architecture that leverages Azure's big data and analytics services. Here's a breakdown of the benefits and some additional considerations:

    Benefits of Your Proposed Setup:

    • SFTP to ADLS Gen 2: Directly ingesting data into Azure Data Lake Storage Gen2 simplifies the architecture and reduces the need for an intermediate Azure VM.
    • Azure Databricks: Utilizing Databricks for daily data ingestion and transformation offers a robust and scalable solution that can handle increasing data volumes efficiently.
    • Power BI Dataflow: Creating a dataflow in Power BI allows for data to be prepared and transformed before it reaches the semantic model, improving performance.
    • Semantic Model: Building a semantic model based on the dataflow ensures that the data is in an optimal format for reporting and analysis.
    • Reports on Semantic Model: Generating reports based on the semantic model will likely be faster and more efficient, as the data has already been processed and structured appropriately.

    Additional Considerations:

    • Data Volume and Velocity: As the data volume grows, ensure that the architecture can scale accordingly. Azure Databricks is designed for scalability, but monitoring and adjustments may be necessary.
    • Data Refresh Scheduling: With the automatic refresh, consider the timing and frequency to balance the load and ensure the most up-to-date data is available for reports.
    • Cost Management: Keep an eye on the costs associated with the services used, especially with Azure Databricks, which can be more expensive. Optimize the use of resources where possible.

    From the Web:

    • The Azure Architecture Center provides a guide on modern data warehouse architecture for small and medium businesses, which includes using Azure Synapse Analytics, Azure SQL Database, and Azure Data Lake Storage. It suggests a hybrid approach that combines easy migration with the opportunity to add big data tools and processes for some use cases 1.

    In conclusion, your proposed setup is well-aligned with modern data architecture practices. It's recommended to review the resources provided, consider the additional points mentioned, and adjust the architecture as needed to ensure it meets the growing demands of your business.

    Regarding the Cost/Billing I would recommened to contact : Billing and Subscription team would be the best to provide more insight and guidance on this scenario:

    Please let us know if you have any further queries. I’m happy to assist you further.    

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments