I need to create a dataset in Azure Machine Learning service from an Azure Data Lake Gen2 registered as a Datastore. Data in the lake are 1000's of avro files stored by an Event Hub Capture following the pattern [EventHub]/[Partition]/[YYYY]/[MM]/[DD]/[HH]/[mm]/[ss], so there is one path for each file.
According to the datasets documentation it is recommended "... creating dataset referencing less than 100 paths in datastores for optimal performance."
What would be the alternative/recommended approach in my application? Streaming data are continuously captured by the Event Hub.