Looking for guidance on integrating third-party data into our domain

Anon101 51 Reputation points
2024-04-18T16:54:37.4433333+00:00

Hi,
My company's transactional system is MS Dynamics 365. We need to import various third-party data into our domain. I'm a Power BI developer but have been assigned this project. I'm looking for guidance on a suitable database and integration tool.

Due to MS Dynamics 365 being a Dataverse, and Power BI's Power Query being the Power Platform Dataflows language, I was intending to use those two technologies to achieve my aim as both are familiar skillsets within my company. However, as I get further into the details of what the Dataverse can and cannot do, I get drawn into limitations and terms I do not understand and neither does anyone else in the company. Therefore, I've decided to not use the Dataverse, although I am still open to using Dataflows if that's possible.

What I need to do?

  1. I need to ingest a dozen (and this number will grow over time) third-party data sources. Let's go on the premise this third-party data is kept in SQL databases. Each third-party data will be a small dataset - probably 20 columns and 20,000 rows. I simply need to get these small datasets saved into a database within our domain. This import will need to ability to be manually performed or scheduled.
  2. Once the raw datasets are in our domain, I will need to join two or more datasets into one. I could join them and save into a physical table, or simply create a View upon the two or more datasets. This newly created joined table may then go on to be joined to another dataset within the same database either as a physical table or a View, and this could go on and on.
  3. I don't foresee any complicated transformations during any ETL process. We just need to get the data imported.
  4. Once any additional joins are complete and the datasets formed (whether as a physical table or as a View), the resulting datasets will be used to feed data to downstream processes. At the moment it's foreseen these downstream processes will be both Dynamics and Power BI consuming the datasets. I know Power BI can consume SQL Views but I'm not sure if the Dataverse has any limitations.

I'm currently learning towards recommending Azure SQL Server database as the storage technology.
It's the integration tool I am struggling to decide on. I could write SQL scripts to extract data from the third-parties and insert into the SQL Server db. I could write another SQL script to extract from various tables within this SQL Server db, join them, and then load to another table or use a SQL View to act as the same.

As you can see our requirements are basic. The recommendation would need to be a MS cloud-based technology. We would like for the recommendation to be a cheap(er) approach but with our basic needs, I wouldn't have thought this a problem.

If you guys can make suggestions and why for them, then I shall seek to understand them better.

Thanks.

Azure SQL Database
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,846 questions
SQL Server Integration Services
SQL Server Integration Services
A Microsoft platform for building enterprise-level data integration and data transformations solutions.
2,592 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Amira Bedhiafi 26,186 Reputation points
    2024-04-18T17:56:31.9666667+00:00

    You're already leaning towards using Azure SQL Database, and this choice aligns well with your needs. It offers easy scalability, robust security features, and native integration with other Microsoft services like Dynamics 365 and Power BI. You can store and manage the datasets efficiently while facilitating straightforward data manipulation tasks such as joins and views.

    For the data integration part, considering that you prefer for Microsoft technologies and the simplicity of your ETL requirements, two primary tools come to my mind ADF and SSIS.

    Why ADF ?

    • Scheduling and Automation: ADF can automate the ingestion of data on a schedule, handling increasing volumes and varieties of data seamlessly.
    • It offers built-in connectors for various data sources, including SQL databases, which matches your scenario of ingesting data from SQL databases.
    • It can directly move data into Azure SQL Database and supports the execution of SQL scripts for data transformation or loading tasks.
    • LADF provides a visual interface for designing ETL processes, which could be beneficial if you or others on your team are not deeply familiar with coding.

    Why SSIS ?

    If your environment is heavily reliant on SQL and you have expertise in managing SQL Server, SSIS could also be a good option.

    • SSIS is highly customizable and powerful for complex data transformation needs.
    • If your team is comfortable with SQL, SSIS scripts and tasks will be easy to manage.
    • SSIS packages can be deployed to Azure and run within either Azure-SSIS integrated runtime in Data Factory or directly on an Azure VM.

    Things to not forget aslo !

    • Azure Data Factory and SSIS can be cost-effective, particularly when used within their scale and operational efficiencies. ADF, being fully managed, can reduce overhead costs associated with managing infrastructure.
    • While ADF provides a more modern, low-code approach which might be easier to use and manage, SSIS offers deeper control and might require more technical proficiency.
    0 comments No comments

  2. ZoeHui-MSFT 37,671 Reputation points
    2024-04-19T02:06:48.38+00:00

    Hi @Anon101

    You may consider using SSIS which is a platform for building enterprise-level data integration and data transformations solutions. Use Integration Services to solve complex business problems.

    As you said you want to use Azure environment, you may try with Azure Data Factory(ADF) or Azure-SSIS IR.

    To get the suitable recommendation and billing, you may contact Microsoft Support team for professional advice.

    https://support.microsoft.com/en-us?icid=TopNavSupport

    Regards,

    Zoe Hui


    If the answer is helpful, please click "Accept Answer" and upvote it.

    0 comments No comments

  3. Pinaki Ghatak 4,610 Reputation points Microsoft Employee
    2024-06-03T10:47:20.6633333+00:00

    Hello @Anon101

    Based on your requirements, I would recommend using Azure Data Factory as your integration tool and Azure SQL Database as your storage technology.

    Azure Data Factory is a cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.

    Azure SQL Database is a fully managed relational database service that provides a highly available, scalable, and secure database engine. It is a cost-effective solution for small to large-sized databases and provides built-in features such as automatic tuning and threat detection.

    With Azure Data Factory, you can easily extract data from third-party SQL databases and load it into your Azure SQL Database. You can also join multiple datasets using data flows or SQL queries and save the results as physical tables or views.

    Once your datasets are ready, you can use them to feed data to downstream processes such as Dynamics and Power BI. Azure Data Factory is a cost-effective solution for your basic needs, and it integrates well with other Microsoft cloud-based technologies.

    Additionally, it provides a user-friendly interface for building and managing your data workflows.


    I hope that this response has addressed your query and helped you overcome your challenges. If so, please mark this response as Answered. This will not only acknowledge our efforts, but also assist other community members who may be looking for similar solutions.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.