Reading .sav files from ADLS Gen2 into Azure Synapse

Johan á Rogvi-Hansen 106 Reputation points
2023-01-11T09:43:45.0333333+00:00

Hi.

I have some pre-trained models saved as .sav files and uploaded to an ADLS Gen2 account.

To my knowledge the .sav format is not supported by spark.read. Is there another way to read files as these?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,378 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Johan á Rogvi-Hansen 106 Reputation points
    2023-01-19T08:44:15.27+00:00

    Hi,

    I was able to do it by mounting the file location and using pickle. But, indeed, I needed to upload pyreadstat to the Spark pool. Specifically, after mounting:

    with open(f'/synfs/{job_id}/mnt/<folder_path>', 'rb') as pickle_file:
        content = pickle.load(pickle_file)
        print(content)
    
    1 person found this answer helpful.

  2. KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator
    2023-01-12T07:47:20.2033333+00:00

    Hi Johan á Rogvi-Hansen ,

    Thank you for using this forum and posting your query.

    To read .sav file format (SPSS), you may explore pyreadstat python package (pip install pyreadstat). This python package is useful to read and write SAS (sas7bdat, sas7bcat, xport/xpt), SPSS (sav, zsav, por) and Stata (dta) files into/from pandas data frames.

    Here are related documentations:

    You may try loading the .sav file data from your ADLS gen2 to Pandas Data Frame and then push it to your desired destination from Pandas Data Frame.

    Or you may explore similar approach as explained in this article: How to use the pyreadstat.read_sav function in pyreadstat

    Other helpful threads: pyreadstat read and write spss without data loss

    Hope this information helps.

    Thank you


    Please do consider clicking on "Accept Answer" and "Upvote" on the post that helps you, as it can be beneficial to other community members.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.