An Apache Spark-based analytics platform optimized for Azure.
got it
for row in df.collect():
print(row.path)
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Hi Guys,
how do we loop through each row in an data frame, which has set of files
storage_account_name = "storacct"
storage_account_access_key = ""
spark.conf.set("fs.azure.account.key.storacct0001.dfs.core.windows.net",storage_account_access_key)
DBFileList=dbutils.fs.ls("abfss://******@storacct0001.dfs.core.windows.net/STG")
df=spark.createDataFrame(DBFileList)
i want to loop through each file name and store into an different table; tried below just gives only column name no row info is displayed.
for fi in df:
print(fi)
Regards,
Navin
An Apache Spark-based analytics platform optimized for Azure.
got it
for row in df.collect():
print(row.path)
bascially, for row in df.collect(): print(row.path)
that will help. It is an anti-pattern and is something you should only do when you have exhausted every other option. it can also be a vectorized solution or DataFrame.apply() method for. anything else lmk. i have a pilatuc pc 12 for sale and a cessna citation for sale also if anyone is interested. Thanks guys!
Iterating through pandas dataFrame objects is generally slow. Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a List Comprehensions , vectorized solution or DataFrame.apply() method for loop through DataFrame.
List comprehensions example
result = [(x, y,z) for x, y,z in zip(df['column1'], df['column2'],df['column3'])]