Share via

Azure synapse, DataFlow cache sink

Max 0 Reputation points
2023-06-18T04:50:42.7266667+00:00

Hi,

Am using a scd1 pattern pipeline and am trying to capture how many rows were updated by using a cache sink, and sending the result to next activity(set variable) in the pipeline which will be send to custom table where we are capturing the logging data.

my issue here is while using cache sink inside a data flow, azure has asked me to change the dataflow logging level setting to "none" and is not allowing be to enable its logging level to "verbose" or "basic", is there a way I can use both cache sink and logging level to "verbose" or "basic"?

Thank you

Azure Synapse Analytics
Azure Synapse Analytics

An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.


2 answers

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,631 Reputation points Microsoft Employee Moderator
    2023-06-19T05:26:36.03+00:00

    Hi Max,

    Thank you for posting query in Microsoft Q&A Platform.

    Yes, its expected. in dataflows if we use cache sink and Write to activity output then we should use logging level as None only. it is not possible to use both the cache sink and logging level to "verbose" or "basic" and Write to activity output from cache sink at the same time in order to avoid conflicts with the cache sink.

    However, you can still capture the logging data by using other methods, such as writing the logging data to a file or sending it to a different sink. For example, you can use a file sink or a database sink to capture the logging data, and then use a subsequent activity in the pipeline to read the logging data from the sink and send it to the custom table.

    Alternatively, you can try to use a different approach to capture how many rows were updated, such as using a conditional split or a derived column transformation to add a flag to the rows that were updated, and then using a subsequent activity in the pipeline to count the number of rows with the flag. This approach does not require the use of a cache sink and should not conflict with the data flow logging level.

    Hope this helps. Please let me know if any further queries.


    Please consider hitting Accept Answer button. Accepted answers help community as well.

    Was this answer helpful?


  2. Sedat SALMAN 14,455 Reputation points MVP
    2023-06-18T12:27:52.57+00:00

    https://learn.microsoft.com/en-us/azure/data-factory/data-flow-sink

    https://learn.microsoft.com/en-us/azure/data-factory/control-flow-execute-data-flow-activity

    In Azure Synapse's Data Flow, a cache sink is a mechanism that allows a data flow to write data into the Spark cache instead of a data store. This feature is useful when you want to reference data multiple times within the same flow using a cache lookup without explicitly joining columns to it. It's typically used for operations such as looking up a max value on a data store or matching error codes to an error message database.

    For the logging level, Azure Synapse provides three options: "Verbose", "Basic", and "None". The "Verbose" mode fully logs activity at each individual partition level during data transformation, while the "Basic" mode only logs transformation durations, and "None" will only provide a summary of durations.

    However, it seems that the logging level settings are not explicitly mentioned in relation to cache sinks. I could not find any information that directly answers your question about enabling the logging level to "Verbose" or "Basic" for a cache sink.

    As for passing data to the next pipeline activity, a cache sink can optionally write your output data to the input of the next pipeline activity, which allows you to pass data out of your data flow activity without needing to persist the data in a data store.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.