Executed Azure SQL Stored Procedures in Spark Notebooks

Richards, Sam (DG-STL-HQ) 151 Reputation points
2023-09-14T19:40:04.0033333+00:00

I am running into an odd issue regarding the usage of spark notebooks to call a stored procedure that updates an Azure SQL Table. I have an Azure Synapse pipeline that calls spark notebooks to do a variety of things and the second to last step is to update a logging table in our Azure SQL db. 

 

When I run the spark notebook manually (by entering in parameters or the one generated by the Synapse pipeline at runtime), the stored procedure executes and the table is updated. This stored procedure updates the rows copied and the execution timestamp. 

 

However, when the pipeline calls the spark notebook and runs it, the notebook runs successfully, including the portion with the stored procedure, but the table shows an update only for the execution timestamp and the recordCount field is ALWAYS 0, not the value shown in the screenshot.

User's image

The screen shot below is from the spark notebook generated at runtime:
User's image

Here is the python below:

# Get a secret from key vault
def getKeyVaultSecret(secretName):
  return mssparkutils.credentials.getSecret('synapsekeyvaultname',secretName,'keyvaultname')

class ODBCSqlServer():
  """This class encapsulates a JDBC connection to a SQL Server instance"""
  
  def __init__(self, odbcUrl, service_principal_id=None, service_principal_secret=None, tenant_id=None):
    """Class Constructor, receives the JDBC URL"""
    
    # Save the JDBC URL as an instance variable
    self.odbcUrl = odbcUrl
    
    self.service_principal_id = service_principal_id
    self.service_principal_secret = service_principal_secret
    self.tenant_id = tenant_id

    self.use_service_principal = True if self.service_principal_id else False
    
    # This is our "Skip Marker"
    self.skip_marker = '>>>>>SKIP<<<<<'


  def executeSQLStatement(self, sqlStatement):

    with contextlib.closing(self.get_connection()) as conn:
      with conn:
        with contextlib.closing(conn.cursor()) as cursor:
          cursor.execute(sqlStatement)
      conn.commit()

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2023-09-15T09:44:04.8366667+00:00

    I assume that the recordCount is derived from a parameter being passed to the notebook from the pipeline. Ensure the pipeline is sending the correct parameter values. It's possible that when you're manually running the notebook, you're inputting a non-zero value for recordCount, but the pipeline might be defaulting to 0.

    The pipeline might be able to update the timestamp due to having write permissions but might be restricted in some way when it comes to retrieving or processing data that determines the recordCount.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.