pyspark notebook - synapse pipeline

arkiboys 9,596 Reputation points
2021-12-29T12:15:14.363+00:00

Hello,
Can you see what is wrong with this pyspark below?

This pyspark code works fine:

...
from pyspark.sql.functions import expr

pExpr = 'upper'
pColName = 'Email'

vCDF = (
vCDF.withColumn(pColName, expr(pExpr + "(" + pColName + ")"))
)

display(vCDF.limit(100))

This pyspark code works fine as well which I use for testing to read the config file:

...
data_collect = vConfigExprDF.collect()
for row in data_collect:
if (len(row["DataQuality"]) > 0):
print(row["ColumnName"])
pColName = row["ColumnName"]
pExpr = row["DataQuality"]


Now I would like to expand the above codes by putting them together to first read the config one line at a time and build the dataframe but not sure what I am doing wrong as I get an error pointing to first bracket vCDF = ( and it says: no viable alternative at input (data_collect

...
from pyspark.sql.functions import expr
vCDF = (
data_collect = vConfigExprDF.collect()
for row in data_collect:
if (len(row["DataQuality"]) > 0):
print(row["ColumnName"])
pColName = row["ColumnName"]
pExpr = row["DataQuality"]

                    vCDF.withColumn(pColName, expr(pExpr + "(" + pColName + ")"))
                   )

display(vCDF.limit(100))

.NET
.NET
Microsoft Technologies based on the .NET software framework.
3,356 questions
0 comments No comments
{count} votes