Executed Azure SQL Stored Procedures in Spark Notebooks

Question

Executed Azure SQL Stored Procedures in Spark Notebooks

Richards, Sam (DG-STL-HQ) 151

I am running into an odd issue regarding the usage of spark notebooks to call a stored procedure that updates an Azure SQL Table. I have an Azure Synapse pipeline that calls spark notebooks to do a variety of things and the second to last step is to update a logging table in our Azure SQL db.

When I run the spark notebook manually (by entering in parameters or the one generated by the Synapse pipeline at runtime), the stored procedure executes and the table is updated. This stored procedure updates the rows copied and the execution timestamp.

However, when the pipeline calls the spark notebook and runs it, the notebook runs successfully, including the portion with the stored procedure, but the table shows an update only for the execution timestamp and the recordCount field is ALWAYS 0, not the value shown in the screenshot.

User's image

The screen shot below is from the spark notebook generated at runtime:
User's image

Here is the python below:

# Get a secret from key vault
def getKeyVaultSecret(secretName):
  return mssparkutils.credentials.getSecret('synapsekeyvaultname',secretName,'keyvaultname')

class ODBCSqlServer():
  """This class encapsulates a JDBC connection to a SQL Server instance"""
  
  def __init__(self, odbcUrl, service_principal_id=None, service_principal_secret=None, tenant_id=None):
    """Class Constructor, receives the JDBC URL"""
    
    # Save the JDBC URL as an instance variable
    self.odbcUrl = odbcUrl
    
    self.service_principal_id = service_principal_id
    self.service_principal_secret = service_principal_secret
    self.tenant_id = tenant_id

    self.use_service_principal = True if self.service_principal_id else False
    
    # This is our "Skip Marker"
    self.skip_marker = '>>>>>SKIP<<<<<'


  def executeSQLStatement(self, sqlStatement):

    with contextlib.closing(self.get_connection()) as conn:
      with conn:
        with contextlib.closing(conn.cursor()) as cursor:
          cursor.execute(sqlStatement)
      conn.commit()

Accepted answer

0 additional answers

Your answer

Answer 1

Amira Bedhiafi 33,071 Volunteer Moderator

I assume that the recordCount is derived from a parameter being passed to the notebook from the pipeline. Ensure the pipeline is sending the correct parameter values. It's possible that when you're manually running the notebook, you're inputting a non-zero value for recordCount, but the pipeline might be defaulting to 0.

The pipeline might be able to update the timestamp due to having write permissions but might be restricted in some way when it comes to retrieving or processing data that determines the recordCount.

Richards, Sam (DG-STL-HQ) 151 Reputation points

2023-09-15T14:04:09.7433333+00:00

@Amira Bedhiafi the record count value is not a parameter passed from the pipeline to the notebook but rather derived from the screenshot above by taking a count() of the pyspark dataframe.
Amira Bedhiafi 33,071 Reputation points Volunteer Moderator

2023-09-15T14:22:28.07+00:00

PySpark uses lazy evaluation, meaning transformations are not computed immediately but are recorded and only computed when an action is called. The count() operation is an action, but if there's an issue with the previous transformations or source data, the count might not reflect the expected values.
Try adding a df.show() or df.take(1) just to test and tell us the result.
Richards, Sam (DG-STL-HQ) 151 Reputation points

2023-09-15T14:48:11.82+00:00

I will try that and get back to you
AnnuKumari-MSFT 34,556 Reputation points Microsoft Employee Moderator

2023-09-20T10:17:31.15+00:00

Hi Richards, Sam (DG-STL-HQ) ,

Just checking in to see if the above answer helped. Please do consider clicking Accept Answer as accepted answers help community as well. Also, please click on Yes for the survey 'Was the answer helpful'

Share via

Executed Azure SQL Stored Procedures in Spark Notebooks

0 additional answers

Your answer