Azure Web App Exception in Worker Process "LookupError" for NLTK : nltk.data.find()

Sintrias 96 Reputation points
2021-10-22T03:58:12.25+00:00

I have an Azure Web App for an AI API using the Questgen question generator project here. This is my fork of the project which has no real change except it can use the latest version of Sense2Vec. My app works fine on my local machine, but after I deploy it and send a Postman GET HTTPS request, which should just return 'Hello World!', I get an error in the container. I've pasted the full container log below. The error is a LookupError from root = nltk.data.find(f"{self.subdir}/{self.__name}") which is triggering an exception LookupError(resource_not_found). This doesn't really make sense though because my code uses nltk.download() for the resources it needs.

This has me questioning my knowledge of how Azure Web Apps work. I've been learning about Docker containers recently so when the log says that the container failed, does that mean there is a new container process for each incoming HTTP request? If so then that would be a problem because the NLTK resources would be downloaded each time, and that uses the most hardware resources. Also, if that's the case, how can I fix this? Is there a different Azure resource I need to use other than Web Apps or Web APIs?

2021-10-22T02:42:49.727079159Z   _____                               
2021-10-22T02:42:49.727084159Z   /  _  \ __________ _________   ____  
2021-10-22T02:42:49.727094459Z  /  /_\  \___   /  |  \_  __ \_/ __ \ 
2021-10-22T02:42:49.727098059Z /    |    \/    /|  |  /|  | \/\  ___/ 
2021-10-22T02:42:49.727101659Z \____|__  /_____ \____/ |__|    \___  >
2021-10-22T02:42:49.727106059Z         \/      \/                  \/ 
2021-10-22T02:42:49.727109659Z 
2021-10-22T02:42:49.727112859Z A P P   S E R V I C E   O N   L I N U X
2021-10-22T02:42:49.727116259Z 
2021-10-22T02:42:49.727119459Z Documentation: http://aka.ms/webapp-linux
2021-10-22T02:42:49.727122859Z Python 3.7.9
2021-10-22T02:42:49.727126059Z Note: Any data outside '/home' is not persisted
2021-10-22T02:42:49.845758523Z Starting OpenBSD Secure Shell server: sshd.
2021-10-22T02:42:49.870854191Z App Command Line not configured, will attempt auto-detect
2021-10-22T02:42:49.871712000Z Launching oryx with: create-script -appPath /home/site/wwwroot -output /opt/startup/startup.sh -virtualEnvName antenv -defaultApp /opt/defaultsite -bindPort 8000
2021-10-22T02:42:49.880675195Z Found build manifest file at '/home/site/wwwroot/oryx-manifest.toml'. Deserializing it...
2021-10-22T02:42:49.882728517Z Build Operation ID: |Naw/gARSU78=.7ac7fd71_
2021-10-22T02:42:49.883572726Z Oryx Version: 0.2.20210708.1, Commit: 6ceb6608673b94827bac111ef5ea01c216f92abb, ReleaseTagName: 20210708.1
2021-10-22T02:42:49.884122732Z Output is compressed. Extracting it...
2021-10-22T02:42:49.884876440Z Extracting '/home/site/wwwroot/output.tar.gz' to directory '/tmp/8d99502e5af8c42'...
2021-10-22T02:43:38.292543747Z App path is set to '/tmp/8d99502e5af8c42'
2021-10-22T02:43:38.530015948Z Detected an app based on Flask
2021-10-22T02:43:38.530987862Z Generating `gunicorn` command for 'app:app'
2021-10-22T02:43:38.714114285Z Writing output script to '/opt/startup/startup.sh'
2021-10-22T02:43:39.011214940Z Using packages from virtual environment antenv located at /tmp/8d99502e5af8c42/antenv.
2021-10-22T02:43:39.012112353Z Updated PYTHONPATH to ':/tmp/8d99502e5af8c42/antenv/lib/python3.7/site-packages'
2021-10-22T02:43:39.760096764Z [2021-10-22 02:43:39 +0000] [36] [INFO] Starting gunicorn 20.1.0
2021-10-22T02:43:39.761640586Z [2021-10-22 02:43:39 +0000] [36] [INFO] Listening at: http://0.0.0.0:8000 (36)
2021-10-22T02:43:39.762272095Z [2021-10-22 02:43:39 +0000] [36] [INFO] Using worker: sync
2021-10-22T02:43:39.767909576Z [2021-10-22 02:43:39 +0000] [39] [INFO] Booting worker with pid: 39
2021-10-22T02:43:48.190191968Z [2021-10-22 02:43:48 +0000] [39] [ERROR] Exception in worker process
2021-10-22T02:43:48.190224768Z Traceback (most recent call last):
2021-10-22T02:43:48.190230668Z   File "/tmp/8d99502e5af8c42/antenv/lib/python3.7/site-packages/nltk/corpus/util.py", line 84, in __load
2021-10-22T02:43:48.190243969Z     root = nltk.data.find(f"{self.subdir}/{zip_name}")
2021-10-22T02:43:48.190248069Z   File "/tmp/8d99502e5af8c42/antenv/lib/python3.7/site-packages/nltk/data.py", line 583, in find
2021-10-22T02:43:48.190252069Z     raise LookupError(resource_not_found)
2021-10-22T02:43:48.190255769Z LookupError: 
2021-10-22T02:43:48.190259369Z **********************************************************************
2021-10-22T02:43:48.190262969Z   Resource  [93mstopwords [0m not found.
2021-10-22T02:43:48.190267469Z   Please use the NLTK Downloader to obtain the resource:
2021-10-22T02:43:48.190271469Z 
2021-10-22T02:43:48.190274969Z    [31m>>> import nltk
2021-10-22T02:43:48.190278869Z   >>> nltk.download('stopwords')
2021-10-22T02:43:48.190282669Z    [0m
2021-10-22T02:43:48.190286269Z   For more information see: https://www.nltk.org/data.html
2021-10-22T02:43:48.190289869Z 
2021-10-22T02:43:48.190293369Z   Attempted to load  [93mcorpora/stopwords.zip/stopwords/ [0m
2021-10-22T02:43:48.190297069Z 
2021-10-22T02:43:48.190300569Z   Searched in:
2021-10-22T02:43:48.190304169Z     - '/root/nltk_data'
2021-10-22T02:43:48.190307669Z     - '/opt/python/3.7.9/nltk_data'
2021-10-22T02:43:48.190311169Z     - '/opt/python/3.7.9/share/nltk_data'
2021-10-22T02:43:48.190314670Z     - '/opt/python/3.7.9/lib/nltk_data'
2021-10-22T02:43:48.190318370Z     - '/usr/share/nltk_data'
2021-10-22T02:43:48.190321970Z     - '/usr/local/share/nltk_data'
2021-10-22T02:43:48.190325470Z     - '/usr/lib/nltk_data'
2021-10-22T02:43:48.190329070Z     - '/usr/local/lib/nltk_data'
2021-10-22T02:43:48.190332570Z **********************************************************************
2021-10-22T02:43:48.190336170Z 
2021-10-22T02:43:48.190339670Z 
2021-10-22T02:43:48.190343070Z During handling of the above exception, another exception occurred:
2021-10-22T02:43:48.190346670Z 
2021-10-22T02:43:48.190350070Z Traceback (most recent call last):
2021-10-22T02:43:48.190353670Z   File "/opt/python/3.7.9/lib/python3.7/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
2021-10-22T02:43:48.190357470Z     worker.init_process()
2021-10-22T02:43:48.190361070Z   File "/opt/python/3.7.9/lib/python3.7/site-packages/gunicorn/workers/base.py", line 134, in init_process
2021-10-22T02:43:48.190364870Z     self.load_wsgi()
2021-10-22T02:43:48.190368470Z   File "/opt/python/3.7.9/lib/python3.7/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
2021-10-22T02:43:48.190373270Z     self.wsgi = self.app.wsgi()
2021-10-22T02:43:48.190379370Z   File "/opt/python/3.7.9/lib/python3.7/site-packages/gunicorn/app/base.py", line 67, in wsgi
2021-10-22T02:43:48.190383371Z     self.callable = self.load()
2021-10-22T02:43:48.190386971Z   File "/opt/python/3.7.9/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
2021-10-22T02:43:48.190390871Z     return self.load_wsgiapp()
2021-10-22T02:43:48.190394471Z   File "/opt/python/3.7.9/lib/python3.7/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
2021-10-22T02:43:48.190398371Z     return util.import_app(self.app_uri)
2021-10-22T02:43:48.190402571Z   File "/opt/python/3.7.9/lib/python3.7/site-packages/gunicorn/util.py", line 359, in import_app
2021-10-22T02:43:48.190406671Z     mod = importlib.import_module(module)
2021-10-22T02:43:48.190410271Z   File "/opt/python/3.7.9/lib/python3.7/importlib/__init__.py", line 127, in import_module
2021-10-22T02:43:48.190414171Z     return _bootstrap._gcd_import(name[level:], package, level)
2021-10-22T02:43:48.190417871Z   File "", line 1006, in _gcd_import
2021-10-22T02:43:48.190421771Z   File "", line 983, in _find_and_load
2021-10-22T02:43:48.190425571Z   File "", line 967, in _find_and_load_unlocked
2021-10-22T02:43:48.190429371Z   File "", line 677, in _load_unlocked
2021-10-22T02:43:48.190433171Z   File "", line 728, in exec_module
2021-10-22T02:43:48.190437071Z   File "", line 219, in _call_with_frames_removed
2021-10-22T02:43:48.190440871Z   File "/tmp/8d99502e5af8c42/app.py", line 3, in 
2021-10-22T02:43:48.190444771Z     from Questgen import main
2021-10-22T02:43:48.190448371Z   File "/tmp/8d99502e5af8c42/antenv/lib/python3.7/site-packages/Questgen/__init__.py", line 4, in 
2021-10-22T02:43:48.190452371Z     from Questgen.mcq import mcq
2021-10-22T02:43:48.190455972Z   File "/tmp/8d99502e5af8c42/antenv/lib/python3.7/site-packages/Questgen/mcq/mcq.py", line 16, in 
2021-10-22T02:43:48.190459872Z     import pke
2021-10-22T02:43:48.190463472Z   File "/tmp/8d99502e5af8c42/antenv/lib/python3.7/site-packages/pke/__init__.py", line 5, in 
2021-10-22T02:43:48.190467372Z     from pke.base import LoadFile
2021-10-22T02:43:48.190470972Z   File "/tmp/8d99502e5af8c42/antenv/lib/python3.7/site-packages/pke/base.py", line 31, in 
2021-10-22T02:43:48.190474872Z     lang_stopwords = {get_alpha_2(l): l for l in stopwords._fileids}
2021-10-22T02:43:48.190478572Z   File "/tmp/8d99502e5af8c42/antenv/lib/python3.7/site-packages/nltk/corpus/util.py", line 121, in __getattr__
2021-10-22T02:43:48.190482372Z     self.__load()
2021-10-22T02:43:48.190485972Z   File "/tmp/8d99502e5af8c42/antenv/lib/python3.7/site-packages/nltk/corpus/util.py", line 86, in __load
2021-10-22T02:43:48.190492772Z     raise e
2021-10-22T02:43:48.190496472Z   File "/tmp/8d99502e5af8c42/antenv/lib/python3.7/site-packages/nltk/corpus/util.py", line 81, in __load
2021-10-22T02:43:48.190500272Z     root = nltk.data.find(f"{self.subdir}/{self.__name}")
2021-10-22T02:43:48.190503972Z   File "/tmp/8d99502e5af8c42/antenv/lib/python3.7/site-packages/nltk/data.py", line 583, in find
2021-10-22T02:43:48.190507872Z     raise LookupError(resource_not_found)
2021-10-22T02:43:48.190511372Z LookupError: 
2021-10-22T02:43:48.190514972Z **********************************************************************
2021-10-22T02:43:48.190518572Z   Resource  [93mstopwords [0m not found.
2021-10-22T02:43:48.190522572Z   Please use the NLTK Downloader to obtain the resource:
2021-10-22T02:43:48.190526273Z 
2021-10-22T02:43:48.190529773Z    [31m>>> import nltk
2021-10-22T02:43:48.190533473Z   >>> nltk.download('stopwords')
2021-10-22T02:43:48.190537173Z    [0m
2021-10-22T02:43:48.190540773Z   For more information see: https://www.nltk.org/data.html
2021-10-22T02:43:48.190544373Z 
2021-10-22T02:43:48.190547773Z   Attempted to load  [93mcorpora/stopwords [0m
2021-10-22T02:43:48.190551473Z 
2021-10-22T02:43:48.190554873Z   Searched in:
2021-10-22T02:43:48.190558373Z     - '/root/nltk_data'
2021-10-22T02:43:48.190561873Z     - '/opt/python/3.7.9/nltk_data'
2021-10-22T02:43:48.190565473Z     - '/opt/python/3.7.9/share/nltk_data'
2021-10-22T02:43:48.190568973Z     - '/opt/python/3.7.9/lib/nltk_data'
2021-10-22T02:43:48.190572573Z     - '/usr/share/nltk_data'
2021-10-22T02:43:48.190576073Z     - '/usr/local/share/nltk_data'
2021-10-22T02:43:48.190579573Z     - '/usr/lib/nltk_data'
2021-10-22T02:43:48.190583073Z     - '/usr/local/lib/nltk_data'
2021-10-22T02:43:48.190586673Z **********************************************************************
2021-10-22T02:43:48.190590373Z 
2021-10-22T02:43:48.206904802Z [2021-10-22 02:43:48 +0000] [39] [INFO] Worker exiting (pid: 39)
2021-10-22T02:43:48.768375965Z [2021-10-22 02:43:48 +0000] [36] [INFO] Shutting down: Master
2021-10-22T02:43:48.769032074Z [2021-10-22 02:43:48 +0000] [36] [INFO] Reason: Worker failed to boot.
Azure Container Instances
Azure Container Instances
An Azure service that provides customers with a serverless container experience.
636 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sintrias 96 Reputation points
    2021-10-25T02:29:11.207+00:00

    So I figured it out. I'm still learning Azure. It's a huge platform.

    The problem is that the server doesn't keep an instance of the project code running. Instead, there is some sort of interceptor that looks for any requests to the host url then filters the requests to containers with separate instances of the project code. When these container processes complete their work, the containers are destroyed. That means any data that was created in the containers are also destroyed or deleted. This is a huge problem for my project as there is a lot of live data that needs to persist in only a single instance. Running multiple containers means that for each container, the resource data has to be downloaded.

    0 comments No comments

0 additional answers

Sort by: Most helpful