Unable to figure out how to add data to existing data lake file

Sven Peeters 66 Reputation points
2021-04-06T13:23:15.38+00:00

Hi,

I'm using the Azure SDK for .NET to manipulate files on the data lake (Gen2)
Within an Azure Function, I would like to add some data to a csv file stored on the data lake.

I came up with this method, should work according to the documentation (or I did not fully understand it).

Problem is that the data is not 'flushed' to the file. It remains the original content.
Can't figure out what's going on here i'm afraid :-(

Any tips ?

Regards,
Sven Peeters

PS : I must add data incrementally, otherwise the memory consumption can become an issue here.

public void AddFileContents(string fullPath, string content, string leaseId = null)
        {
            DataLakeFileClient dataLakeFileClient = GetFileSystemClient().GetFileClient(fullPath);
            dataLakeFileClient.CreateIfNotExists();

            long currentLength = dataLakeFileClient.GetProperties().Value.ContentLength;

            byte[] byteArray = Encoding.UTF8.GetBytes(content);
            MemoryStream mStream = new MemoryStream(byteArray);
            long fileSize = mStream.Length;

            dataLakeFileClient.Append(mStream, currentLength, leaseId: leaseId);
            dataLakeFileClient.Flush(position: currentLength, close: true, conditions: new DataLakeRequestConditions() { LeaseId = leaseId });
        }
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,355 questions
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,321 questions
C#
C#
An object-oriented and type-safe programming language that has its roots in the C family of languages and includes support for component-oriented programming.
10,307 questions
{count} votes

Accepted answer
  1. Samara Soucy - MSFT 5,051 Reputation points
    2021-04-08T01:16:56.247+00:00

    This is very close, you should only need to make one small adjustment to the Flush() parameters on line 13. The position parameter must be equal to the length of the file after all data has been written.

    I was able to get this working by adding your fileSize variable to currentLength:

    dataLakeFileClient.Flush(position: currentLength + fileSize, close: true, conditions: new DataLakeRequestConditions() { LeaseId = leaseId });  
    
    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful