Thankyou for using Microsoft Q&A platform and thanks for posting your query here.
From the description of your query, I can sense that you want to skip rows from the dataframe using synapse notebook as well as you want to split single column into multiple by pipe delimiter. Please let me know if my understanding is incorrect.
- Try using Lambda function in pyspark to achieve the requirement of skipping the rows. Here are few helpful resources:
- how to skip first few rows from data file in pyspark
- How to skip lines while reading a CSV file as a dataFrame using PySpark?
- For splitting the columns, you can make use of split function on top of df1:
df2 = df1.withColumn("temp", split(df1['col1'], '|'))
num_cols = len(df2.select("temp").first()[0])
col_names = ["Prop" + str(i) for i in range(num_cols)]
for i in range(num_cols):
df2 = df2.withColumn(col_names[i], df2.temp[i])
df2 = df2.drop("temp")
df2.show()
Here is the reference documentation : PySpark split() Column into Multiple Columns
Hope it helps. Please accept the answer if it was helpful. Thanks