Azure Databricks - Split column based on special characters in Databricks

Jothi 11 Reputation points
2020-06-08T10:37:46.403+00:00

I have a column in my csv file that possibly has value in below formats.
"Q1_1__Value_-_10_counts"
"Value_10_counts"
"Q1_1__1__value_yes"

This has to be split as below respectively
"Value_-_10_counts"
"Value_10_counts"
"value_yes"

When I try to split based on , i get error if second format arrives. So, i put an 'IF' condition as below.
Below is the python code I used.
if '
' in df['column'].values:
df[['column','newcolumn']] = df['column'].str.split('__', expand=True)
else:
df['newcolumn'] = df['column']

code enters else condition even column has ''
tried below also - getting result as 'False' only
exists = '
' in df.column
print(exists)

found = df[df['column'].str.contains('__')]
display(found)

Kindly help, how to handle this? any of you faced similar situation
Or, let me know if it will work if split by Q first, and then check for first alphabet (which will ignore initial numbers and __)

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,910 questions
{count} votes