Error while loading Spacy's "en_core_web_md"

Tinniam V Ganesh 1 Reputation point
2022-05-18T05:12:16.963+00:00

Hi,
I am trying to load "en_core_web_md". Every time I do so I get the error
``"OSError: [E050] Can't find model 'en_core_web_md'. It doesn't seem to be a Python package or a valid path to a data directory"

This happened for a few days. Then suddenly it started to working for a few hours. I cloned the notebook and executed it. I got the error all over again for the cloned notebook. Now I also get the error for the original notebook.

Please let me know how to fix this. I have tried solution from Stack overflow

Let me know

Ganesh

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,514 questions
{count} votes

2 answers

Sort by: Most helpful
  1. PRADEEPCHEEKATLA 90,641 Reputation points Moderator
    2022-05-30T07:43:13.907+00:00

    Hello @Tinniam V Ganesh ,

    Apologize for the delay in response.

    As per the repro from our end it's working as expected without any error message:

    • Cluster - 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)
    • Python - 3.8.10
    • spaCy version 3.3

    206635-image.png

    I'm able to run without any issue:

    206590-image.png

    For more details, refer to NLP with Python and spaCy - First Steps(Python).

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.

  2. Tinniam V Ganesh 1 Reputation point
    2022-05-19T10:15:07.017+00:00

    Hi @PRADEEPCHEEKATLA
    Here are the details

    Cluster - 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12)
    Python - 3.8.10
    spaCy version 3.3

    Code snippet

    import spacy  
    !python -m spacy download en_core_web_sm   
    from spacy import displacy  
      
    nlp = spacy.load("en_core_web_sm")  
    # Process whole documents  
    text = ("When Sebastian Thrun started working on self-driving cars at "  
            "Google in 2007, few people outside of the company took him "  
            "seriously. “I can tell you very senior CEOs of major American "  
            "car companies would shake my hand and turn away because I wasn’t "  
            "worth talking to,” said Thrun, in an interview with Recode earlier "  
            "this week.")  
    doc = nlp(text)  
    

    The error I get is "OSError: [E050] Can't find model 'en_core_web_sm'. It doesn't seem to be a Python package or a valid path to a data directory."

    I also see these messages. I don't know if it is relevant

    /databricks/python3/lib/python3.8/site-packages/spacy/util.py:845: UserWarning: [W094] Model 'en_core_web_sm' (2.2.5) specifies an under-constrained spaCy version requirement: >=2.2.2. This can lead to compatibility problems with older versions, or as new spaCy versions are released, because the model may say it's compatible when it's not. Consider changing the "spacy_version" in your meta.json to a version range, with a lower and upper pin. For example: >=3.3.0,<3.4.0 warnings.warn(warn_msg)

    Also the message when installing 'en_core_web_sm"

    "Defaulting to user installation because normal site-packages is not writeable"

    As I had mentioned, this worked briefly and then stopped.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.