Extracting real-time data from API to SQL Server database using SSIS

Question

I'm new to SSIS and trying to extract real-time data from an API into an SQL Server database for ETL processing. While I have written C# code that fetches data, it saves it to a JSON file. However, the API has over 20 different routes to access each table, and I don't think it's feasible to write 20+ scripts and prepare 20+ JSON files to get all the data. Is there a better way to get the data faster and keep it updated? Also, what's the best way to store it in an SQL database so I can use direct query in Power BI?

Answer

If you have a number of data transfers to accomplish in SSIS and they're not the same (like 20 different URLs), even though you can parameterize the sources and destinations, you still have to deal with different data in the data flows.

The alternative with SSIS is to automate the creation of the package. There are many ways to do that, including BIML. (https://en.wikipedia.org/wiki/Business_Intelligence_Markup_Language)

For SSIS, we would often do metadata-driven development, where we use a table in a database to hold the required details (in this case, source, destination, data flow columns, etc.) Then we use a tool to build the pipelines from that. If you had many packages to develop with many variations, then BIML or other automation might make sense.

But if it's just 20 as a one-off task, it would be hard to justify the learning curve required. I suspect you might be better off just creating one pipeline to do one table, then cloning it (copy/paste in Solution Explorer) to create each other one and just editing them for the path and required data flows and destinations. Then make a master package to run the others.

Azure Data Factory (ADF) is a little more forgiving in terms of dynamic data paths, and creating target tables automatically, etc. But again, that's another learning curve.

Extracting real-time data from API to SQL Server database using SSIS

1 answer