differences in row counting using spark and panas readers

Question

I'm reading the same CSV once in Scala with Spark and once in Python with Pandas, this is the code that I'm using:

val tabella = spark.read.option("header",true).option("mode", "DROPMALFORMED").csv("/FileStore/tables/IMMOBILI_MDRE_FACT_FENICE_INNER_DWH_CREDITI_2.csv")

tabella = pd.read_csv("/dbfs/FileStore/tables/IMMOBILI_MDRE_FACT_FENICE_INNER_DWH_CREDITI_2.csv")

In both case when i count i find different rows

Accepted Answer

Hello @Auricchio Valerio ,

Welcome to the Microsoft Q&A platform.

I had tested the same from our end, it results the same row count using Scala with Spark and once in Python with pandas.

Checkout the results:

Using dataframe.count:

Using display(dataframe):

Hope this helps. Do let us know if you any further queries.

------------

Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
Want a reminder to come back and check responses? Here is how to subscribe to a notification.

Share via

differences in row counting using spark and panas readers

0 additional answers