How to determine if Spark is rewriting data
First open the SQL DAG for your write stage. Scroll up to the top of the job’s page and click on the Associated SQL Query:
You should now see the DAG. If not, scroll around a bit and you should see it:
If you’re doing a Delete or Update operation, look at the amount of data being written by the writer versus what you expect. If you’re seeing a lot more data being written than you expect, you’re probably rewriting data:
If you’re doing a merge, the merge node has explicit statistics about how much data it’s rewriting.
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for