how add class library for BeautifulSoup in Azure Databricks

Verma, Manish Kumar 131 Reputation points
2020-08-27T07:00:54.517+00:00

hi all,
how add class library for BeautifulSoup in Azure Data-bricks

i want to run below code in pyspark notebook

from bs4 import BeautifulSoup
import pandas as pd

table = BeautifulSoup(open('C:/age0.html','r').read()).find('table')
df = pd.read_html(table)

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,178 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 89,571 Reputation points Microsoft Employee
    2020-08-27T11:23:55.497+00:00

    Hello @Verma, Manish Kumar ,

    Welcome to the Microsoft Q&A platform.

    To install library for BeautifulSoup on Azure DataBricks:

    %sh  
    pip install beautifulsoup4  
    

    20817-image.png

    Tested on:

    Databricks Runtime Version7.0 (includes Apache Spark 3.0.0, Scala 2.12)

    Note: The file location should be Databricks File System (DBFS) or mount Azure Storage accounts. You cannot use local path in Azure Databricks.

    20894-image.png

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.