Azure synapse, DataFlow cache sink

Question

Azure synapse, DataFlow cache sink

Max 0

Hi,

Am using a scd1 pattern pipeline and am trying to capture how many rows were updated by using a cache sink, and sending the result to next activity(set variable) in the pipeline which will be send to custom table where we are capturing the logging data.

my issue here is while using cache sink inside a data flow, azure has asked me to change the dataflow logging level setting to "none" and is not allowing be to enable its logging level to "verbose" or "basic", is there a way I can use both cache sink and logging level to "verbose" or "basic"?

Thank you

ShaikMaheer-MSFT 38,631 Reputation points Microsoft Employee Moderator

2023-06-22T06:34:28.85+00:00

Hi Max, Just checking if below answer helpful. If yes, please consider hitting Accept Answer button. Accepted answers help community as well. Please let me know if any further queries. Thank you.
PRADEEPCHEEKATLA 91,866 Reputation points

2023-06-27T05:39:02.9+00:00

@Max - We received your feedback that the answer provided on the thread was not helpful.

Kindly let us know what we could have done better to improve the answer and make your engagement experience good. We are here to help you and strive to make your experience better and greatly value your feedback.

Our engineer had provided a detailed answer which has clear steps for which you are looking for. If you wish, you may re-surveying/rating for the engagement you received on the thread. Your feedback is very important to us.

Looking forward to you reply. Much appreciate your feedback!

Regards,

PRADEEPCHEEKATLA-MSFT

2 answers

Your answer

ShaikMaheer-MSFT 38,631 Reputation points Microsoft Employee Moderator

2023-06-22T06:34:28.85+00:00

Hi Max, Just checking if below answer helpful. If yes, please consider hitting Accept Answer button. Accepted answers help community as well. Please let me know if any further queries. Thank you.
PRADEEPCHEEKATLA 91,866 Reputation points

2023-06-27T05:39:02.9+00:00

@Max - We received your feedback that the answer provided on the thread was not helpful.

Kindly let us know what we could have done better to improve the answer and make your engagement experience good. We are here to help you and strive to make your experience better and greatly value your feedback.

Our engineer had provided a detailed answer which has clear steps for which you are looking for. If you wish, you may re-surveying/rating for the engagement you received on the thread. Your feedback is very important to us.

Looking forward to you reply. Much appreciate your feedback!

Regards,

PRADEEPCHEEKATLA-MSFT

Answer 1

ShaikMaheer-MSFT 38,631 Microsoft Employee Moderator

Hi Max,

Thank you for posting query in Microsoft Q&A Platform.

Yes, its expected. in dataflows if we use cache sink and Write to activity output then we should use logging level as None only. it is not possible to use both the cache sink and logging level to "verbose" or "basic" and Write to activity output from cache sink at the same time in order to avoid conflicts with the cache sink.

However, you can still capture the logging data by using other methods, such as writing the logging data to a file or sending it to a different sink. For example, you can use a file sink or a database sink to capture the logging data, and then use a subsequent activity in the pipeline to read the logging data from the sink and send it to the custom table.

Alternatively, you can try to use a different approach to capture how many rows were updated, such as using a conditional split or a derived column transformation to add a flag to the rows that were updated, and then using a subsequent activity in the pipeline to count the number of rows with the flag. This approach does not require the use of a cache sink and should not conflict with the data flow logging level.

Hope this helps. Please let me know if any further queries.

Please consider hitting Accept Answer button. Accepted answers help community as well.

ShaikMaheer-MSFT 38,631 Reputation points Microsoft Employee Moderator

2023-06-20T07:55:42.94+00:00

Hi Max, Just checking if above answer helpful. If yes, please consider hitting Accept Answer button. Accepted answers help community as well. Please let me know if any further queries. Thank you.
Campbell Mellor 0 Reputation points

2024-10-15T03:01:58.6333333+00:00

hi @ShaikMaheer-MSFT is this still the case that data flow logging level must = None for the Write to activity output functionality to work?

Answer 2

https://learn.microsoft.com/en-us/azure/data-factory/data-flow-sink

https://learn.microsoft.com/en-us/azure/data-factory/control-flow-execute-data-flow-activity

In Azure Synapse's Data Flow, a cache sink is a mechanism that allows a data flow to write data into the Spark cache instead of a data store. This feature is useful when you want to reference data multiple times within the same flow using a cache lookup without explicitly joining columns to it. It's typically used for operations such as looking up a max value on a data store or matching error codes to an error message database.

For the logging level, Azure Synapse provides three options: "Verbose", "Basic", and "None". The "Verbose" mode fully logs activity at each individual partition level during data transformation, while the "Basic" mode only logs transformation durations, and "None" will only provide a summary of durations.

However, it seems that the logging level settings are not explicitly mentioned in relation to cache sinks. I could not find any information that directly answers your question about enabling the logging level to "Verbose" or "Basic" for a cache sink.

As for passing data to the next pipeline activity, a cache sink can optionally write your output data to the input of the next pipeline activity, which allows you to pass data out of your data flow activity without needing to persist the data in a data store.

Share via

Azure synapse, DataFlow cache sink

2 answers

Your answer