Logging from my module in Synapse

Victor Seifert 46 Reputation points
2022-03-08T08:27:15.197+00:00

I have a local application that I package as a whl file and upload to my Synapse Spark Clusters.

In the application, I use a logger like this:

logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
logger.info("This is a logging line.")

I noticed that once I import my module and execute it in Synapse, there are no logging messages displayed. I also tried to change the logger to output the messages to stdout:

stdout_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stdout_handler)

and while it works locally (instead of logging-messages I see the "This is a logging line" in my normal stdout) - I still don't see any logging messages in Synapse.

What do I have to configure to be able to see the logger.info messages below the Synapse Notebook cells when I import and execute the module in Synapse?

I tried the following solution from the comments, but it didn't get anywhere.

 log4jLogger = sc._jvm.org.apache.log4j
 LOGGER = log4jLogger.LogManager.getLogger(__name__)
 LOGGER.info("pyspark script logger initialized")

I also tried the following (replaced <packagename> with the name of my package in which I log messages):

 import logging
 logger = logging.getLogger("<packagename>")
 logger.setLevel(logging.INFO)

and it works the first time I run the code with "Run all" - but once I re-run a cell , I get an error (probably because Synapse closed the logging file??)

 Traceback (most recent call last):
   File "/home/trusted-service-user/cluster-env/env/lib/python3.8/logging/__init__.py", line 1088, in emit
     stream.write(msg + self.terminator)
   File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_<appnumber>/container_<container_number/tmp/<number>", line         565, in write
     super(PipeOutput, self).write(message)
   File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_<appnumber>/container_<container_number/tmp/<number>", line 555, in write
     super(UnicodeDecodingStringIO, self).write(s)
 ValueError: I/O operation on closed file

I get the same error if I execute functionality from my package that tries to log something - the first time it works, the second time I execute it in the same notebook (even the same cell) it crashes with the error above.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,462 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 79,376 Reputation points Microsoft Employee
    2022-03-11T07:30:22.67+00:00

    Hello @Victor Seifert ,

    Thanks for the question and using MS Q&A platform.

    (UPDATE:14/3/2022): As per the discussion with the internal team, the reason that you are facing this IO clothing issue is because of these codes:

     stdout_handler = logging.StreamHandler(sys.stdout)  
     logger.addHandler(stdout_handler)  
    

    For this issue, as you are trying to use StreamHandler(), by default it’s sys.stderr, please check this: https://docs.python.org/3/library/logging.handlers.html#logging.StreamHandler, now you are using sys.stdout

    So as a summary, you should avoid using code like logging.StreamHandler(sys.stdout/sys.stderr) which will have issue caused by livy’s original design.

    You should directly use the logging method we provided as the example below. The log messages could be found within both cell output and driver log directly.

    --------------------------------------------------------

    You may try with below sample code for python logging with different format and customized log level.

    import logging  
      
    # Customize the logging format for all loggers  
    FORMAT = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"  
    formatter = logging.Formatter(fmt=FORMAT)  
    for handler in logging.getLogger().handlers:  
        handler.setFormatter(formatter)  
      
    # Customize log level for all loggers  
    logging.getLogger().setLevel(logging.INFO)  
      
    # Customize the log level for a specific logger  
    customizedLogger = logging.getLogger('customized')  
    customizedLogger.setLevel(logging.WARNING)  
      
    # logger that use the default global log level  
    defaultLogger = logging.getLogger('default')  
      
    defaultLogger.debug("default debug message")  
    defaultLogger.info("default info message")  
    defaultLogger.warning("default warning message")  
    defaultLogger.error("default error message")  
    defaultLogger.critical("default critical message")  
      
    # logger that use the customized log level  
    customizedLogger.debug("customized debug message")  
    customizedLogger.info("customized info message")  
    customizedLogger.warning("customized warning message")  
    customizedLogger.error("customized error message")  
    customizedLogger.critical("customized critical message")  
    

    Here is the expected output for the above sample code for python logging with different format and customized log level.

    182183-image.png

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

0 additional answers

Sort by: Most helpful