How to read this multiline CSV file as a spark data frame
Hello,
Below is the multiline CSV file sample data delimited with semicolons (;)
Reference no;"Status";"Proj";"Series";"Note";
99V2A0001;"Draft";"PEV";"VP";"PVO";
89V2Z0001;"Accepted";"L541";"VP1";"Person could not catch the delay. Supplier deliver LU:2019/12/23 WD3
Moden Wood:20/01/98 W15
Fl";
99C939993;"Accepted";"V31";"V12";"frigerant, ThermalHeater and Coordinates. The interim sol is to "run the time"
VU1 plann";
99V2A0B01;"Accepted";"519A";"B89";"Problem 1: The 59 TT series were planned to get the "73"/TT but RT 18w44
they VP series "1"/ZP, this";
I've tried creating a spark data frame with the below code in the attached image and got the output data frame with 6 rows whereas the input file has only 4 rows with it.
Due to extra quotes ("") present in the last column for rows 3 & 4, I believe Spark couldn't able to read it in a single row. Please help me with how to resolve this issue. (Expected number of output rows is 4)
Note: I've highlighted the extra quotes in bold for your reference
Regards,
Vishal