Hello @CzarR and welcome to Microsoft Q&A.
I see you want to know how or why this code works. I notice this code is extremely similar to concurrent.futures.html.
It has been a long time since I dealt with this particular nuance, but the only way I can read it, that makes sense is like this:
future_to_url = {executor.submit(buildJsonFileDir, row,dest): row for row in dfHubFiles.rdd.collect()}
executor.submit takes a pointer to a function, and parameters to be fed to that function. dest
has been previousle defined. However row
has not been defined yet.
The binding of row
has been delayed, or deferred to the next statement.
The row
before for row in...
is naming the output of for row in dfHubFiles.rdd.collect()
. Normally when we see for x in y
it is followed by a code block, however here it is not. Here it is the unpacking of the collection that is the goal.
Try running in the interpreter:
stuff = [1,2,3,4]
{print(f): f for f in stuff}
This should help illuminate some of what is happening. For a better explanation, I think you should check on StackOverflow. This also looks similar to lambda stuff and generator expressions and comprehensions. Yes I think it is a comprehension. ( a = {x for x in 'abracadabra' if x not in 'abc'}
)
In line 21:
print('%r generated an exception: %s' % (url,future.exception()))
The %r and %s are symbols to insert string defined later (after %
), namely the url and future.exception.
This is printf-style string formatting. %r uses the repr()
to convert to string and %s uses str()
to convert to string.