how add class library for BeautifulSoup in Azure Databricks

Verma, Manish Kumar 131 Reputation points
2020-08-27T07:00:54.517+00:00

hi all,
how add class library for BeautifulSoup in Azure Data-bricks

i want to run below code in pyspark notebook

from bs4 import BeautifulSoup
import pandas as pd

table = BeautifulSoup(open('C:/age0.html','r').read()).find('table')
df = pd.read_html(table)

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,218 questions
No comments
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 53,361 Reputation points Microsoft Employee
    2020-08-27T11:23:55.497+00:00

    Hello @Verma, Manish Kumar ,

    Welcome to the Microsoft Q&A platform.

    To install library for BeautifulSoup on Azure DataBricks:

    %sh  
    pip install beautifulsoup4  
    

    20817-image.png

    Tested on:

    Databricks Runtime Version7.0 (includes Apache Spark 3.0.0, Scala 2.12)

    Note: The file location should be Databricks File System (DBFS) or mount Azure Storage accounts. You cannot use local path in Azure Databricks.

    20894-image.png

    Hope this helps. Do let us know if you any further queries.

    ----------------------------------------------------------------------------------------

    Do click on "Accept Answer" and Upvote on the post that helps you, this can be beneficial to other community members.