Question about ETL (Azure Data Factory)

Sebastian Pacheco 221 Puntos de reputación
2024-06-24T12:56:57.72+00:00

Hello everyone... I have a question, I was reading about Azure Data Factory, Azure Databricks, Azure Synapse Analytics, Azure Data Lake and SQL Database and I am in doubt which one is best for me to use... My scheme is as follows:

I have Azure database for postgresql Flexible Server where I have my data, I need a tool that will search for certain information and then with python I can do calculations and then leave this new information in some storage where "Power Bi" can connect and create reports and graphs .

Based on this I was seeing that with ADF I could build a pipeline to rescue the data, pass it to Databricks so that the data analyst can program in python and do the calculations, then this is sent to synapse or data lake or sql database and connect " Power Bi" to this service to create reports and graphs.

My questions are:

1.- Would a scheme like this be good?

2.- ADF is strong for doing data transformation manually with python or is databricks better?

3.- Maybe at first we only need "storage" (and nothing else) to connect Power Bi... between synapse, sql databae or data lake, if this is the case with a SQL Databe would it be the most economical?

Thanks!

Azure
Azure
Plataforma e infraestructura de informática en la nube para crear, implementar y administrar aplicaciones y servicios a través de una red mundial de centros de datos administrados por Microsoft.
412 preguntas
0 comentarios No hay comentarios
{count} votos

2 respuestas

Ordenar por: Lo más útil
  1. Sebastian Pacheco 221 Puntos de reputación
    2024-07-09T14:30:42.5033333+00:00

    Hello?? Has anyone worked with these tools? Can I just create a python script with ADF and do the data transformations/calculations?

    0 comentarios No hay comentarios

  2. Sebastian Pacheco 221 Puntos de reputación
    2024-09-23T12:12:58.66+00:00

    I'm writing in case anyone else has a question or wants to do something like this. In the end I did the following: Create a pipeline in Azure Data Dactory and create some processes:

    01.- Create a function that creates and another that deletes an azure batch node. process 1: call a function where it creates a node in azure batch, on this node I install python and load my script that does the calculations.process 2: I call the script and it runs, rescuing the data from my BD01 and then sending the results to my BD02.process 3: I call my other function and delete the node That, so far, works fine and the node only stays up during the script calculation process.

    0 comentarios No hay comentarios

Su respuesta

Las respuestas pueden ser marcadas como Respuestas aceptadas por el autor de la pregunta, lo que indica a los usuarios que la respuesta resolvió su problema.