Mingkai Liu Thanks for sharing the solution in the thread so that it will benefit others experiencing the similar issue. Since the Microsoft Q&A community has a policy that The question author cannot accept their own answer. They can only accept answers by others, I will repost your solution in case you like to accept the answer for greater visibility.
Issue:
After upgrading to Python 3.11/V4 version, you have experienced errors: File is not a zip file
and cannot unpack non-iterable NoneType object
with the following lines of code: (using pandas library)
try:
# Retrieve the connection string for use with the application.
connect_str = os.getenv("flpwrenvirogreendata_Connection_Str")
logging.info(f"Get connection string: {connect_str}")
# Create the BlobServiceClient object which will be used to create a container client
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
logging.info(f"Created blob service client {blob_service_client}")
# Create a blob client using the file name as the name for the blob
blob_client = blob_service_client.get_blob_client(container=container_name, blob=file_name)
logging.info(f"Created blob client {blob_client}")
# read file from blob as binary
downloader = blob_client.download_blob()
logging.info(f"downloader {downloader }")
content_byte = downloader.readall()
logging.info(f"content_byte{type(content_byte)}")
try:
#"pyxlsb" "openpyxl"
dfs = pd.read_excel(downloader.readall(), sheet_name=None, engine="openpyxl")
logging.info(f"read excel {type(content_byte)}")
except pd.errors.ParserError as e:
logging.error("ParserError:", str(e))
sheets_name = list(dfs.keys())
logging.info(f"sheets_name {sheets_name}")
logging.info(
f"Read the raw data from directory in bolb successfulu. Container: {container_name}, File: {file_name}"
)
return dfs, sheets_name, content_byte
except Exception as e:
logging.error(e)
While debugging, you found the issue was specifically with the line dfs = pd.read_excel(downloader.readall(), sheet_name=None, engine="openpyxl")
Solution:
You have changed the following line of code to use BytesIO
to wrap the binary data to resolve the issue.
dfs = pd.read_excel(BytesIO(downloader.readall()), sheet_name=None, engine="openpyxl").