debug button options

arkiboys 9,706 Reputation points
2023-06-30T06:07:09.46+00:00

hello,

In ADF pipeline, there is a debug button with a drop down of two items:

1- use dataflow debug session

2- use activity runtime

Question:

In simple terms, what is the difference between the two and when to use each?

1- I run the [use dataflow session] if I want to run the dataflow and preview data

2- I use the [use activity runtime] if I change a value of the pipeline parameter and run it and no need to preview dataflow data...

Have I got the concept correct?

thank you

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

Answer accepted by question author
  1. Bhargava-MSFT 31,361 Reputation points Microsoft Employee Moderator
    2023-06-30T19:53:10.36+00:00

    @arkiboys

    These two options will appear only when the pipeline has the dataflow activity.

    1- use dataflow debug session

    2- use activity runtime

    The difference between these two options is how the IR is used when running the debug/activity runtime sessions.

    use dataflow debug session:

    This option allows you to debug a data flow in a separate debug session. This debug cluster is separate from the original cluster that was used to run the data flow in the pipeline. The debug cluster is typically smaller and less powerful than the original cluster, which is more cost-effective.

    You can see the 7 sec(for my workload) for the processing time using the dataflow debug session. It took less time to spin up the small cluster.

    User's image

    use activity runtime:

    When you use the "Use Activity Runtime" option, ADF runs the data flow as part of the pipeline activity runtime, using the original cluster that was configured in the data flow integration runtime. "Use Data Flow Debug Session" option uses the original cluster, which is typically more powerful and expensive than the debug cluster.

    Here, we can optimize the integration runtime on the actual pipeline based on the time taken to run the data flow. If the data flow is taking too long to run, we may need to adjust the integration runtime to use a more powerful cluster or to optimize the data flow itself.

    You can see the processing time as 24 sec when using the activity runtime. The reason for more time is spinning the actual compute takes more time in my case.

    User's image

    You can see this video demonstration on these both options.

    I hope this helps.

    If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.