Hi Javier Guerrero, Thank you for posting query in Microsoft Q&A Platform.
Looks like,
pyarrowfs-adlsgen2
library cannot directly take connection string. It will usesazure.identity.DefaultAzureCredential
. As perazure.identity.DefaultAzureCredential
it will looks for credentials at many placee includingEnvironment Variables
. To know more about it check this link.In your case, since you wanted to work withconnection string
only, I would suggest you to consider usingazure.storage.filedatalake.DataLakeFileContent
andpandas
. Below is the code.
import io
from azure.storage.filedatalake import DataLakeFileClient
import pandas as pd
# Replace these values with your own
account_url = "https://accountName.dfs.core.windows.net"
file_system_name = "containerName"
file_path = "folder/sample.parquet"
credential = "accountkey"
# Create a DataLakeFileClient object for the specified file
file_client = DataLakeFileClient(account_url=account_url, file_system_name=file_system_name, file_path=file_path, credential=credential)
# # Download the parquet file as a stream
# with file_client.download_file() as stream:
# # Read the parquet file into a pandas DataFrame
# df = pd.read_parquet(stream, engine='pyarrow')
# Download the parquet file as a stream
stream = file_client.download_file()
data = stream.readall()
# Read the parquet file into a pandas DataFrame
df = pd.read_parquet(io.BytesIO(data), engine='pyarrow')
print(df)
In the above code, we are not downloading file to local. We are downloading file directly to stream and reading from there.
Hope this helps. Please let me know how it goes.
Please consider hitting Accept Answer
button. Accepted answers help community as well.