NoneType object error related to Blob storage

Question

NoneType object error related to Blob storage

Mingkai Liu 55

After upgrading my azure functions to runtime to ~4, and python to 3.11, as in my previous post https://learn.microsoft.com/en-us/answers/questions/1368325/microsoft-azure-functions-extensionbundle-error-re, they seem to run fine. Except one of them shows the below error, seems to be related to Blob storage: User's image

I then roll back to ~2, python 3.7 (it was 3.6 before upgrading), and similar error still occurs.
User's image

Any idea what could be the reason?

Thank you very much!

Mingkai Liu 55

The function where error occurs is like

def __read_from_blob_storage(container_name: Union[Path, PathLike, str], file_name: Union[PathLike, str]) -> BinaryIO:
    """read raw json data from storage blob

    Args:
        container_name (Union[Path, PathLike, str]): [description]
        file_name (Union[PathLike, str]): [description]

    Returns:
        BinaryIO: [description]
    """

    try:
        # Retrieve the connection string for use with the application.
        connect_str = os.getenv("flpwrenvirogreendata_Connection_Str")
        logging.info(f"Get connection string: {connect_str}")
        # Create the BlobServiceClient object which will be used to create a container client
        blob_service_client = BlobServiceClient.from_connection_string(connect_str)
        logging.info(f"Created blob service client {blob_service_client}")
        # Create a blob client using the file name as the name for the blob
        blob_client = blob_service_client.get_blob_client(container=container_name, blob=file_name)
        logging.info(f"Created blob client {blob_client}")
        # read  file from blob as binary
        downloader = blob_client.download_blob()
        logging.info(f"downloader {downloader }")
        content_byte = downloader.readall()
        logging.info(f"content_byte{type(content_byte)}")

        try:
            #"pyxlsb" "openpyxl"
            dfs = pd.read_excel(downloader.readall(), sheet_name=None, engine="openpyxl") 
            logging.info(f"read excel {type(content_byte)}")        
        except pd.errors.ParserError as e:
            logging.error("ParserError:", str(e))
        
        
        sheets_name = list(dfs.keys())
        logging.info(f"sheets_name {sheets_name}")
        logging.info(
            f"Read the raw data from  directory in bolb successfulu. Container: {container_name}, File: {file_name}"
        )

        return dfs, sheets_name, content_byte

    except Exception as e:
        logging.error(e)

I have added a logging.info behind each operation to find out which part goes wrong.

Now I can narrow down the problem to line

 dfs = pd.read_excel(downloader.readall(), sheet_name=None, engine="openpyxl")

based on the logging info below, as can be seen, the read excel operation is not successful. But the content return from downloader.readall() is not null and of type 'bytes'.

User's image

Any hint will be really appreciated.

Cheers

MuthuKumaranMurugaachari-MSFT 22,441 Reputation points Moderator

2023-09-18T20:06:16.2766667+00:00

Mingkai Liu Thanks for posting your question in Microsoft Q&A. After reviewing the error message and code snippet, it appears that you are using pandas library to read excel data from Azure Storage and got error: File is not a zip file. Found a lot of similar errors were reported in the GitHub repo and I suspect this might be reading memory/partial data using the library not Azure Functions itself.

I suggest validating the code snippet outside Azure Functions and see if you were able to read excel successfully. If not, then reach out to the community experts via https://github.com/pandas-dev/pandas/issues in the repo. They have the technical expertise on that library to assist you further.

For any questions in Azure Functions or the issue is related to Azure Functions, let me know the details (share any doc/articles you follow). I would be happy to answer any.
Mingkai Liu 55 Reputation points

2023-09-19T05:30:06.2666667+00:00

Thank you @MuthuKumaranMurugaachari-MSFT . Interesting, I find the solution by using BytesIO() to wrap the binary data, as suggested by ChatGPT: "It can be helpful because it provides an in-memory file-like object that Pandas can work with..."

dfs = pd.read_excel(BytesIO(downloader.readall()), sheet_name=None, engine="openpyxl").

I don't know whether it is due my upgrade to beyond python 3.6, but it seems the trick works.
MuthuKumaranMurugaachari-MSFT 22,441 Reputation points Moderator

2023-09-19T13:46:17.21+00:00

Mingkai Liu Awesome! I am very happy to hear that you found the solution and greatly appreciate you sharing the solution with the community.

Accepted answer

0 additional answers

Your answer

MuthuKumaranMurugaachari-MSFT 22,441 Reputation points Moderator

2023-09-18T20:06:16.2766667+00:00

Mingkai Liu Thanks for posting your question in Microsoft Q&A. After reviewing the error message and code snippet, it appears that you are using pandas library to read excel data from Azure Storage and got error: File is not a zip file. Found a lot of similar errors were reported in the GitHub repo and I suspect this might be reading memory/partial data using the library not Azure Functions itself.

I suggest validating the code snippet outside Azure Functions and see if you were able to read excel successfully. If not, then reach out to the community experts via https://github.com/pandas-dev/pandas/issues in the repo. They have the technical expertise on that library to assist you further.

For any questions in Azure Functions or the issue is related to Azure Functions, let me know the details (share any doc/articles you follow). I would be happy to answer any.
Mingkai Liu 55 Reputation points

2023-09-19T05:30:06.2666667+00:00

Thank you @MuthuKumaranMurugaachari-MSFT . Interesting, I find the solution by using BytesIO() to wrap the binary data, as suggested by ChatGPT: "It can be helpful because it provides an in-memory file-like object that Pandas can work with..."

dfs = pd.read_excel(BytesIO(downloader.readall()), sheet_name=None, engine="openpyxl").

I don't know whether it is due my upgrade to beyond python 3.6, but it seems the trick works.
MuthuKumaranMurugaachari-MSFT 22,441 Reputation points Moderator

2023-09-19T13:46:17.21+00:00

Mingkai Liu Awesome! I am very happy to hear that you found the solution and greatly appreciate you sharing the solution with the community.

Answer 1

Mingkai Liu Thanks for sharing the solution in the thread so that it will benefit others experiencing the similar issue. Since the Microsoft Q&A community has a policy that The question author cannot accept their own answer. They can only accept answers by others, I will repost your solution in case you like to accept the answer for greater visibility.

Issue:

After upgrading to Python 3.11/V4 version, you have experienced errors: File is not a zip file and cannot unpack non-iterable NoneType object with the following lines of code: (using pandas library)

try:
        # Retrieve the connection string for use with the application.
        connect_str = os.getenv("flpwrenvirogreendata_Connection_Str")
        logging.info(f"Get connection string: {connect_str}")
        # Create the BlobServiceClient object which will be used to create a container client
        blob_service_client = BlobServiceClient.from_connection_string(connect_str)
        logging.info(f"Created blob service client {blob_service_client}")
        # Create a blob client using the file name as the name for the blob
        blob_client = blob_service_client.get_blob_client(container=container_name, blob=file_name)
        logging.info(f"Created blob client {blob_client}")
        # read  file from blob as binary
        downloader = blob_client.download_blob()
        logging.info(f"downloader {downloader }")
        content_byte = downloader.readall()
        logging.info(f"content_byte{type(content_byte)}")

        try:
            #"pyxlsb" "openpyxl"
            dfs = pd.read_excel(downloader.readall(), sheet_name=None, engine="openpyxl") 
            logging.info(f"read excel {type(content_byte)}")        
        except pd.errors.ParserError as e:
            logging.error("ParserError:", str(e))
        
        
        sheets_name = list(dfs.keys())
        logging.info(f"sheets_name {sheets_name}")
        logging.info(
            f"Read the raw data from  directory in bolb successfulu. Container: {container_name}, File: {file_name}"
        )

        return dfs, sheets_name, content_byte

    except Exception as e:
        logging.error(e)

While debugging, you found the issue was specifically with the line dfs = pd.read_excel(downloader.readall(), sheet_name=None, engine="openpyxl")

Solution:

You have changed the following line of code to use BytesIO to wrap the binary data to resolve the issue.

dfs = pd.read_excel(BytesIO(downloader.readall()), sheet_name=None, engine="openpyxl").

Share via

NoneType object error related to Blob storage

0 additional answers

Your answer