got it
for row in df.collect():
print(row.path)
Azure databricks python for loop, read row
Hi Guys,
how do we loop through each row in an data frame, which has set of files
storage_account_name = "storacct"
storage_account_access_key = ""
spark.conf.set("fs.azure.account.key.storacct0001.dfs.core.windows.net",storage_account_access_key)
store files information blob to list
DBFileList=dbutils.fs.ls("abfss://databrickstg@storacct0001.dfs.core.windows.net/STG")
convert List to Dataframe
df=spark.createDataFrame(DBFileList)
i want to loop through each file name and store into an different table; tried below just gives only column name no row info is displayed.
for fi in df:
print(fi)
Regards,
Navin
3 answers
Sort by: Most helpful
-
Dondapati, Navin 281 Reputation points
2020-11-20T03:09:55.793+00:00 -
Evan Chatter 16 Reputation points
2021-04-21T06:10:55.827+00:00 Iterating through pandas dataFrame objects is generally slow. Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a List Comprehensions , vectorized solution or DataFrame.apply() method for loop through DataFrame.
List comprehensions example
result = [(x, y,z) for x, y,z in zip(df['column1'], df['column2'],df['column3'])]
-
Anonymous
2022-05-02T18:38:41.48+00:00 bascially, for row in df.collect(): print(row.path)
that will help. It is an anti-pattern and is something you should only do when you have exhausted every other option. it can also be a vectorized solution or DataFrame.apply() method for. anything else lmk. i have a pilatuc pc 12 for sale and a cessna citation for sale also if anyone is interested. Thanks guys!