Very slow and long running simple querry

Question

from azureml.opendatasets import NycTlcGreen

data = NycTlcGreen()
df = data.to_spark_dataframe()

Display 10 rows

display(df.limit(10))

run for over 40 min without ever ending : Conf : (8 vcpu /64 GO 3nodes).

Any help would be appreciated. Nothing in the Queue, no previous job, spark pool basic config.

Many thanks for any hint.

Answer

Hello @Morpheuss ,

Welcome to the Microsoft Q&A platform.

We haven't experienced the above behaviour using Synapse Apache Spark pools till date.

This issue looks strange. For a deeper investigation and immediate assistance on this issue, if you have a support plan you may file a support ticket.

As per the test from my end on Synapse Apache Spark Pool: Medium (8 vCores/64 GB).

On a new cluster it took nearly 3mins 20 secs.

On a running cluster which took just 15 secs.

Hope this helps. Do let us know if you any further queries.

---------------------------------------------------------------------------

Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

Share via

Very slow and long running simple querry

Display 10 rows

1 answer