Synapse Spark Pool write to Excel workbook on ADLS2 ?

Jason Campbell 51 Reputation points
2021-05-27T19:48:34.717+00:00

How can I write from Synapse Spark Pool to Excel workbook (multiple sheets) on ADLS2 ?

Something like Pandas ExcelWriter()

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,316 questions
0 comments No comments
{count} votes

Accepted answer
  1. Samara Soucy - MSFT 5,141 Reputation points
    2021-06-01T00:36:46.34+00:00

    You can use the Pandas.ExcelWriter() by outputting to Bytes.IO instead of a file. Either the storage or data lake endpoints will work about the same.

    Setting up the file:

    from pandas import ExcelWriter
    from pandas import DataFrame
    import pandas
    import io
    import xlsxwriter
    
    #import whichever client you plan to use
    from azure.storage.blob import BlobServiceClient, BlobClient
    #OR
    from azure.storage.filedatalake import DataLakeServiceClient
    
    d = {'col1': [1, 2], 'col2': [3, 4]}
    df = pandas.DataFrame(data=d)\
    
    io = io.BytesIO()
    
    # Use a temp filename to keep pandas happy.
    writer = pandas.ExcelWriter('temp.xlsx', engine='xlsxwriter')
    
    # Set the filename/file handle in the xlsxwriter.workbook object.
    writer.book.filename = io
    
    # Write the data frame to the StringIO object.
    df.to_excel(writer)
    writer.save()
    

    Using Data Lake client:

    connStr = "<data lake url>"
    token = "<SAS toke>"
    
    dl_service_client = DataLakeServiceClient(account_url=connStr, credential=token)
    file_client = dl_service_client.get_file_client("<container name>", "<file path>")
    file_client.create_file ()
    file_client.append_data(data, offset=0, length=len(data))
    file_client.flush_data(len(data))
    

    Blob client:

    connStr = "<storage url>"
    token = "<SAS toke>"
    
    blob_service_client = BlobServiceClient(connStr, token)
    blob_client = blob_service_client.get_blob_client("<container name>", "<file path>")
    blob_client.upload_blob(io.getvalue(), overwrite = True)
    
    1 person found this answer helpful.
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Jason Campbell 51 Reputation points
    2021-06-03T02:35:57.537+00:00

    Hi

    I tried this against the blob client. I got the following error:

    ServiceRequestError : <urllib3.connection.HTTPSConnection object at 0x7fbe7fcb4668>: Failed to establish a new connection: [Errno -2] Name or service not known
    Traceback (most recent call last):
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/core/tracing/decorator.py", line 83, in wrapper_use_tracer
    return func(*args, **kwargs)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/storage/blob/_blob_client.py", line 539, in upload_blob
    return upload_block_blob(**options)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/storage/blob/_upload_helpers.py", line 106, in upload_block_blob
    **kwargs)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/storage/blob/_generated/operations/_block_blob_operations.py", line 217, in upload
    pipeline_response = self._client._pipeline.run(request, stream=False, **kwargs)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/core/pipeline/_base.py", line 211, in run
    return first_node.send(pipeline_request) # type: ignore
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
    [Previous line repeated 5 more times]
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/core/pipeline/policies/_redirect.py", line 157, in send
    response = self.next.send(request)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/storage/blob/_shared/policies.py", line 515, in send
    raise err
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/storage/blob/_shared/policies.py", line 489, in send
    response = self.next.send(request)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/storage/blob/_shared/policies.py", line 290, in send
    response = self.next.send(request)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/core/pipeline/_base.py", line 71, in send
    response = self.next.send(request)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/core/pipeline/_base.py", line 103, in send
    self._sender.send(request.http_request, **request.context.options),
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/storage/blob/_shared/base_client.py", line 319, in send
    return self._transport.send(request, **kwargs)
    File "/home/trusted-service-user/cluster-env/env/lib/python3.6/site-packages/azure/core/pipeline/transport/_requests_basic.py", line 284, in send
    raise error
    azure.core.exceptions.ServiceRequestError: <urllib3.connection.HTTPSConnection object at 0x7fbe7fcb4668>: Failed to establish a new connection: [Errno -2] Name or service not known


  2. Jason Campbell 51 Reputation points
    2021-06-04T02:49:52.797+00:00

    This is now working for me - thanks!

    I changed from a SAS token to the storage account access key, and I'm using DataLakeServiceClient. (I also changed 'data' variable to use io.getvalue()).

    Cheers,
    Jason

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.