pyspark notebook - synapse pipeline
Hello,
Can you see what is wrong with this pyspark below?
This pyspark code works fine:
...
from pyspark.sql.functions import expr
pExpr = 'upper'
pColName = 'Email'
vCDF = (
vCDF.withColumn(pColName, expr(pExpr + "(" + pColName + ")"))
)
display(vCDF.limit(100))
This pyspark code works fine as well which I use for testing to read the config file:
...
data_collect = vConfigExprDF.collect()
for row in data_collect:
if (len(row["DataQuality"]) > 0):
print(row["ColumnName"])
pColName = row["ColumnName"]
pExpr = row["DataQuality"]
Now I would like to expand the above codes by putting them together to first read the config one line at a time and build the dataframe but not sure what I am doing wrong as I get an error pointing to first bracket vCDF = ( and it says: no viable alternative at input (data_collect
...
from pyspark.sql.functions import expr
vCDF = (
data_collect = vConfigExprDF.collect()
for row in data_collect:
if (len(row["DataQuality"]) > 0):
print(row["ColumnName"])
pColName = row["ColumnName"]
pExpr = row["DataQuality"]
vCDF.withColumn(pColName, expr(pExpr + "(" + pColName + ")"))
)
display(vCDF.limit(100))