@E P - I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer .
Ask: We have a Synapse workspace under one subscription (dev) with a managed private endpoint to a Storage Account in another subscription (production). Data from production is processed in serverless Spark jobs.
We tried to replicate replicated the Synapse setup from dev to the production subscription, and it's functionally ok. But we are seeing data transfer costs. These kind of costs we don't see in the dev subscription, even though we were processing the same amount of data.
We are trying to understand the difference and would appreciate any help.
Solution: What we found is that, when accessing a restricted storage account, there is a functional difference between running a "standalone" Spark job and running it using an activity in a pipeline. The standalone requires the (manage) private endpoint. When using a pipeline, the access is authorized solely through the workspace managed identity.
If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.
If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.
Please don’t forget to Accept Answer
and Yes
for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.