NLTK download in Azure Function downloads at every call

2JK 241 Reputation points
2021-11-16T06:29:13.143+00:00

I have an Azure Function that does some text processing, mainly identifies some tags in sentences and removes prepositions, conjunctions, etc. For that I'm using NLTK. My function looks something like this:

def process_text(text):
    nltk.download('averaged_perceptron_tagger')
    tagged_text = nltk.tag.pos_tag(text.split(' '))
    cleaned_text = [word for word,tag in tagged_text if tag not in ['VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'JJR', 'RB', 'NNP']]
    return cleaned_text

However, nltk.download() seems to download the dependency every time the call is made, which slows down everything. Not sure if I understand this wrong but my function is missing a lot of text and I'm assuming it's because of this?

Any way around this?

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
2,601 questions
No comments
{count} votes

1 answer

Sort by: Most helpful
  1. MughundhanRaveendran-MSFT 11,621 Reputation points Microsoft Employee
    2021-12-07T17:01:30.003+00:00

    @2JK ,

    Thanks for reaching out to Q&A.

    I just want to isolate this issue. What is the behavior when you run this script in your local machine? Does it download everytime you run the script?

    If this issue occurs in your local machine then this is related to nltk. If the issue is only seen when the function is deployed to Azure Functions runtime, then there is a possibility that the download happens whenever there is a change in background VM/servers (Azure platform side) that runs the function. Especially with the serverless plans like consumption and Elastic premium. I am not 100% sure about this though. I am not sure which plan you are running, try to choose Function app running on dedicated app service plan SKU and test it.

    Please let me know your inputs.