Azure Data Lake storage file directory listing Json Serialization Error

Cebiroglu Gökhan (MT-A) 1 Reputation point
2020-06-09T14:52:41.037+00:00

Hi,

I am having troubles in listing the directories in in the Azure Data Lake storage. I am pretty much using the default template for listing the directories on a file system according to data-lake-storage-directory-file-acl-dotnet . I have wrapped the code up into a unit test, but I get some sort of a JSON serialization issue. This is the asnyc task I am calling from a unit test method.

   lang-cs  
      public async Task<List<string>> ListFilesInDirectory(string Directory)  
           {  
     
               IAsyncEnumerator<PathItem> enumerator = _dataLakeFileSystemClient.GetPathsAsync(Directory).GetAsyncEnumerator();  
               await enumerator.MoveNextAsync();  
               List<string> somelist = new List<string>();  
     
               PathItem item = enumerator.Current;  
               while (item != null)  
               {  
                   Console.WriteLine(item.Name);  
                   somelist.Add(item.Name);  
                   if (!await enumerator.MoveNextAsync())  
                   {  
                       break;  
                   }  
     
                   item = enumerator.Current;  
               }  
     
               return somelist;  
           }  

This is my error message.

System.AggregateException HResult=0x80131500 Message=One or more errors occurred. ('<' is an invalid start of a value. LineNumber: 0 | BytePositionInLine: 0.) Source=System.Private.CoreLib StackTrace: at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken) at System.Threading.Tasks.Task.Wait() at HistoricMarketDataLibTests.HistoricIceDataManagerTests.DirectoryListing(String Directory) in C:\Users\bbf22\source\repos\HistoricMarketDataClient\BarDefinitionTests\HistoricIceDataManagerTests.cs:line 143

This exception was originally thrown at this call stack: System.Text.Json.ThrowHelper.ThrowJsonReaderException(ref System.Text.Json.Utf8JsonReader, System.Text.Json.ExceptionResource, byte, System.ReadOnlySpan) System.Text.Json.Utf8JsonReader.ConsumeValue(byte) System.Text.Json.Utf8JsonReader.ReadFirstToken(byte) System.Text.Json.Utf8JsonReader.ReadSingleSegment() System.Text.Json.Utf8JsonReader.Read() System.Text.Json.JsonDocument.Parse(System.ReadOnlySpan, System.Text.Json.Utf8JsonReader, ref System.Text.Json.JsonDocument.MetadataDb, ref System.Text.Json.JsonDocument.StackRowStack) System.Text.Json.JsonDocument.Parse(System.ReadOnlyMemory, System.Text.Json.JsonReaderOptions, byte[]) System.Text.Json.JsonDocument.Parse(System.ReadOnlyMemory, System.Text.Json.JsonDocumentOptions) System.Text.Json.JsonDocument.Parse(string, System.Text.Json.JsonDocumentOptions) Azure.Storage.Files.DataLake.ErrorExtensions.CreateException(string, Azure.Core.Pipeline.ClientDiagnostics, Azure.Response) ... [Call Stack Truncated]

Inner Exception 1: JsonReaderException: '<' is an invalid start of a value. LineNumber: 0 | BytePositionInLine: 0.

I get Item=null so it would not iterate over the Pathitems at all. I wonder whether this is related to some service request limits as the data storage is big, but then I would expect to get some sort of a reasonable error code message. I wonder what lies behind the JSON serialization issue. Is this related to the async tasks invoked? I should also mention that I can query the existence of folders or files when specified, but whenever I invoke GetPathAsync or GetPath I am getting troubles. The files involved and the number of files is large though. I wonder whether this causes some sort of a service request issue and whether I should be thinking about mapping file locations in a SQL-based backend.

I should also say, I have had a bit more of a success using R Azure RMR to query the content of some subfolders. But querying of any parents of these would take a lot of time. While I had not success in querying any subfolder in .net using the above, I still feel it may be related to the fact that some tasks are not returning the results in time? But then again, this is just me making uneducated guesses.

I would appreciate any help. Far from being an expert on Azure .NET APIs and async tasks.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,315 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. ChiragMishra-MSFT 951 Reputation points
    2020-06-10T07:15:08.923+00:00

    Hi @CebirogluGkhanMTA-4313,

    This error could be linked to your Azure Data Lake Service Client. Can you please make sure that you have followed the prerequisite steps mentioned here : https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-directory-file-acl-dotnet#set-up-your-project.

    Once you have the Service Client pointing to the right connection, please check if the Client is setup fine by doing a simple operation like creating a file system.

    Also, take a look at the file HistoricIceDataManagerTests.cs: line 143. The Stack Trace points to this specific line. If you can share the exact codeline, it would help us diagnose the issue faster.