Synapse Spark: Python logging to log file in Azure Data Lake Storage

Sumit Bhatnagar 1 Reputation point
2023-02-01T04:35:10.99+00:00

I am working in Synapse Spark and building a logger function to handle error logging. I intend to push the logs to an existing log file (data.log) located in AzureDataLakeStorageAccount/Container/Folder/.

In addition to the root logger I have added a StreamHandler and trying to setup a FileHandler to manage the log file write-out.

The log file path I am specifying has path in this format: 'abfss:/container@storageaccountname.dfs.core.windows.net/Data/logging/data.log'

When I run the below code, I get the following error:

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_<number/container_number/abfss:/container@storageaccountname.dfs.core.windows.net/Data/logging/data.log'

The default mount path is getting prefixed to the ADLS file path.

To address the mount path prefix I added a series of '../' to move level up but even with this I end up with a solitary '/' prefixed to my ADLS path.

I have not found any online assistance or article where this has been implemented in an Azure Data Lake setup. Any assistance will be appreciated.

Here is the code:


import logging

def init_logger(name: str, logging_level: int = logging.DEBUG) -> logging.Logger:    
    _log_format = "%(levelname)s %(asctime)s %(name)s: %(message)s"
    _date_format = "%Y-%m-%d %I:%M:%S %p %z"
    _formatter = logging.Formatter(fmt=_log_format, datefmt=_date_format)

    _root_logger = logging.getLogger()
    _logger = logging.getLogger(name)
    _logger.setLevel(logging_level)

    #Root and Stream Handler 
    if _root_logger.handlers:
        for handler in _root_logger.handlers:
            handler.setFormatter(_formatter)

        _logger.setLevel(logging_level)
    else:
        _handler = logging.StreamHandler(sys.stderr)
        _handler.setLevel(logging_level)
        _handler.setFormatter(_formatter)
        _logger.addHandler(_handler)

    __handler = logging.FileHandler(LogFilepath, 'a')
    __handler.setLevel(logging_level)
    __handler.setFormatter(_formatter)
    _logger.addHandler(__handler)

    return _logger

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,338 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,364 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,021 Reputation points
    2023-02-06T22:19:15.9466667+00:00

    @Sumit Bhatnagar Hello and welcome to Microsoft Q&A

    I understand you are trying to set up advanced logging (something I don't have much experience with). The issue is where the logger writes to.

    In my experience, most commands that typically write to the local file system, do not have the protocol to use abfss or other cloud-writing protocols. It would be like doing with open( "https://bing.com", "wr") as myfile:

    The sys and os or whatever library used by open does not know how to use the https protocol.

    So instead the open looks in the local area (which is that mount path).

    Take a look at this fsspec tutorial for writing to datalake. I think you might be able to modify the logging File Handler to use this instead.

    Also the spark filesystem utilities.