How to read this multiline CSV file as a spark data frame

Vishal D 5 Reputation points
2023-06-26T07:52:30.4433333+00:00

Hello,

Below is the multiline CSV file sample data delimited with semicolons (;)

Reference no;"Status";"Proj";"Series";"Note";

99V2A0001;"Draft";"PEV";"VP";"PVO";

89V2Z0001;"Accepted";"L541";"VP1";"Person could not catch the delay. Supplier deliver LU:2019/12/23 WD3

Moden Wood:20/01/98 W15

Fl";

99C939993;"Accepted";"V31";"V12";"frigerant, ThermalHeater and Coordinates. The interim sol is to "run the time"

VU1 plann";

99V2A0B01;"Accepted";"519A";"B89";"Problem 1: The 59 TT series were planned to get the "73"/TT but RT 18w44

they VP series "1"/ZP, this";

I've tried creating a spark data frame with the below code in the attached image and got the output data frame with 6 rows whereas the input file has only 4 rows with it.

Screenshot (172)

Due to extra quotes ("") present in the last column for rows 3 & 4, I believe Spark couldn't able to read it in a single row. Please help me with how to resolve this issue. (Expected number of output rows is 4)

Note: I've highlighted the extra quotes in bold for your reference

Regards,

Vishal

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
{count} votes

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.