Improving System.IO.Pipelines buffer utilization

Question

Improving System.IO.Pipelines buffer utilization

Henric Jungheim 20

Our current mechanism for writing to a System.IO.Pipelines Pipe is a loop that writes small bits of data to the PipeWriter. Inside that loop, we call FlushAsync() after at least N bytes have been written. We discovered a case where MinimumSegmentSize was not an integer multiple of the size of those "small bits" we're writing. This resulted in the reader getting data with segments alternating in size between almost MinimumSegmentSize and nearly zero (the one "small bit" that made it into the buffer).

For now, our work-around is to detect that a buffer is as full as it's ever going to get by looking at the size of the buffer returned by .GetMemory(). We now do something like this:

  PipeOptions _pipeOptions = ... ;
  const int FlushThreshold = ... ;
...
  await foreach (var thing in source.ReadEmAllAsync())
  {
    var size = thing.Size;
    var buffer = writer.GetMemory(size);

    if (buffer.Length >= _pipeOptions.MinimumSegmentSize
      && writer.UnflushedBytes >= FlushThreshold)
    {
        await writer.FlushAsync();
        buffer = writer.GetMemory(size);
    }
    
    thing.CopyTo(buffer.Span[..size]);
    writer.Advance(size);
  }

The buffers presented to the reader are now almost always full, but is this really the right way to solve the problem? It doesn't seem right for the code using the PipeWriter to have to know so much about the pipe's internal buffering strategy. Is there some cleaner way to flush any and all completed buffers, without pushing through tiny partial buffers?

In our application, in addition to the obvious memory usage implications, this has a throughput impact since the code that reads from the pipe is more efficient when presented with large chunks of contiguous data. Write consolidation might not be a primary application for System.IO.Pipelines, but except for this glitch we are now working around, it works really well for that purpose.

The above call to .Advance(0); is commented out because it results in zero-length segments being presented to the reader, but is that a correct usage of the API since this means calling .GetMemory() without a corresponding .Advance()?

Thanks.

0 comments

Answer accepted by question author

Anonymous

Hi @Henric Jungheim , Welcome to Microsoft Q&A,

Given the constraints and the need to efficiently manage buffer sizes without relying heavily on internal details, you can focus on using Advance and FlushAsync more strategically, along with leveraging PipeWriter.CreateBuffer to handle the memory management more efficiently.

var remaining = int.MaxValue;
const int flushThreshold = 131070; // Adjust based on your needs

await foreach (var thing in source.ReadEmAllAsync())
{
    var size = thing.Size;

    // If the remaining space in the buffer is not enough for the new data
    if (size > remaining)
    {
        await writer.FlushAsync();
        remaining = int.MaxValue; // Reset remaining after flush
    }

    var buffer = writer.GetMemory(size);
    thing.CopyTo(buffer.Span[..size]);

    remaining = buffer.Length - size;
    writer.Advance(size);

    // Optionally, flush based on a threshold to avoid over-accumulating
    if (writer.UnflushedBytes >= flushThreshold)
    {
        await writer.FlushAsync();
        remaining = int.MaxValue; // Reset remaining after flush
    }
}

Best Regards,

Jiale

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Henric Jungheim 20 Reputation points

2024-08-01T10:55:27.6966667+00:00
This is exactly what results in the pathological behavior. For the simplest example, consider a thing.Size that happens to always be 5 and a MinimumSegmentSize of 128kiB. Using the suggested approach results in the reader seeing segments lengths like this:

131070 5 131070 5 131070 5 ...

This is what we are trying to avoid because it uses almost twice as much memory for buffering as it needs to, roughly doubles the number of times the reader is awakened, and because it uses more CPU than necessary since the code doing the reading has some per-segment overhead. Using something like this:

var size = thing.Size; if (totalBytesWritten + size >= _pipeOptions.MinimumSegmentSize) { await writer.FlushAsync(); totalBytesWritten = 0; } var buffer = writer.GetMemory(size); thing.CopyTo(buffer.Span[..size]); writer.Advance(size); totalBytesWritten += size;

would avoid this pathological behavior, but requires even more intimate knowledge of Pipe's internals. For the above example with thing.Size stuck at 5, the reader would then see segments like this:

131070 131070 131070 131070 131070 ...

Anonymous

Given the constraints and the need to efficiently manage buffer sizes without relying heavily on internal details, you can focus on using Advance and FlushAsync more strategically, along with leveraging PipeWriter.CreateBuffer to handle the memory management more efficiently.

var remaining = int.MaxValue;
const int flushThreshold = 131070; // Adjust based on your needs

await foreach (var thing in source.ReadEmAllAsync())
{
    var size = thing.Size;

    // If the remaining space in the buffer is not enough for the new data
    if (size > remaining)
    {
        await writer.FlushAsync();
        remaining = int.MaxValue; // Reset remaining after flush
    }

    var buffer = writer.GetMemory(size);
    thing.CopyTo(buffer.Span[..size]);

    remaining = buffer.Length - size;
    writer.Advance(size);

    // Optionally, flush based on a threshold to avoid over-accumulating
    if (writer.UnflushedBytes >= flushThreshold)
    {
        await writer.FlushAsync();
        remaining = int.MaxValue; // Reset remaining after flush
    }
}

Anonymous

2024-08-09T09:33:54.1466667+00:00

Hi @Henric Jungheim , Is there any update in this issue?
Henric Jungheim 20 Reputation points

2024-08-09T20:27:42.6633333+00:00

The flush based on the remaining bytes is working. It would be better if there was some cleaner way to do this through Pipe's API, but this is the cleanest work-around we've found.
Anonymous

2024-08-12T02:59:46.7833333+00:00

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Henric Jungheim 20 Reputation points

2024-08-13T08:11:31.02+00:00

I can't promote your last suggested code to an answer (the one that includes the comment // Optionally, flush based on a threshold to avoid over-accumulating) and the initial suggestion suffers the problem we're trying to avoid.
Anonymous

2024-08-13T08:24:26.8566667+00:00

What was your final solution? I can edit it into the answer. This will allow more people to solve related problems.
Henric Jungheim 20 Reputation points

2024-08-13T15:31:53.8966667+00:00

We're using the "remaining" approach, but without looking at unflushed bytes (in our application, there's no point since we might as well change the buffer size to adjust the flush threshold).

1 additional answer

Your answer

Henric Jungheim 20 Reputation points

2024-08-01T10:55:27.6966667+00:00

This is exactly what results in the pathological behavior. For the simplest example, consider a thing.Size that happens to always be 5 and a MinimumSegmentSize of 128kiB. Using the suggested approach results in the reader seeing segments lengths like this:

131070 5 131070 5 131070 5 ...

This is what we are trying to avoid because it uses almost twice as much memory for buffering as it needs to, roughly doubles the number of times the reader is awakened, and because it uses more CPU than necessary since the code doing the reading has some per-segment overhead. Using something like this:

var size = thing.Size; if (totalBytesWritten + size >= _pipeOptions.MinimumSegmentSize) { await writer.FlushAsync(); totalBytesWritten = 0; } var buffer = writer.GetMemory(size); thing.CopyTo(buffer.Span[..size]); writer.Advance(size); totalBytesWritten += size;

would avoid this pathological behavior, but requires even more intimate knowledge of Pipe's internals. For the above example with thing.Size stuck at 5, the reader would then see segments like this:

131070 131070 131070 131070 131070 ...
Anonymous

2024-08-02T08:18:04.75+00:00

Given the constraints and the need to efficiently manage buffer sizes without relying heavily on internal details, you can focus on using Advance and FlushAsync more strategically, along with leveraging PipeWriter.CreateBuffer to handle the memory management more efficiently.

var remaining = int.MaxValue; const int flushThreshold = 131070; // Adjust based on your needs await foreach (var thing in source.ReadEmAllAsync()) { var size = thing.Size; // If the remaining space in the buffer is not enough for the new data if (size > remaining) { await writer.FlushAsync(); remaining = int.MaxValue; // Reset remaining after flush } var buffer = writer.GetMemory(size); thing.CopyTo(buffer.Span[..size]); remaining = buffer.Length - size; writer.Advance(size); // Optionally, flush based on a threshold to avoid over-accumulating if (writer.UnflushedBytes >= flushThreshold) { await writer.FlushAsync(); remaining = int.MaxValue; // Reset remaining after flush } }
Anonymous

2024-08-09T09:33:54.1466667+00:00

Hi @Henric Jungheim , Is there any update in this issue?
Henric Jungheim 20 Reputation points

2024-08-09T20:27:42.6633333+00:00

The flush based on the remaining bytes is working. It would be better if there was some cleaner way to do this through Pipe's API, but this is the cleanest work-around we've found.
Anonymous

2024-08-12T02:59:46.7833333+00:00

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
Henric Jungheim 20 Reputation points

2024-08-13T08:11:31.02+00:00

I can't promote your last suggested code to an answer (the one that includes the comment // Optionally, flush based on a threshold to avoid over-accumulating) and the initial suggestion suffers the problem we're trying to avoid.
Anonymous

2024-08-13T08:24:26.8566667+00:00

What was your final solution? I can edit it into the answer. This will allow more people to solve related problems.
Henric Jungheim 20 Reputation points

2024-08-13T15:31:53.8966667+00:00

We're using the "remaining" approach, but without looking at unflushed bytes (in our application, there's no point since we might as well change the buffer size to adjust the flush threshold).

Answer 1

This work-around doesn't need to call .GetMemory() multiple times:

  PipeOptions _pipeOptions = ... ;
  const int FlushThreshold = ... ;
...
  var remaining = int.MaxValue;
  await foreach (var thing in source.ReadEmAllAsync())
  {
    var size = thing.Size;

    if (size > remaining)
    {
      // We could also check if the total bytes written
      // exceeds FlushThreshold before flushing and still
      // get good buffer utilization.
      await writer.FlushAsync();
    }

    var buffer = writer.GetMemory(size);
    
    thing.CopyTo(buffer.Span[..size]);

    remaining = buffer.Length - size;

    writer.Advance(size);
  }

The current Pipe API does not provide any way to do this that so that the code does not need to make assumptions about the given pipe's buffering strategy?

Share via

Improving System.IO.Pipelines buffer utilization

1 additional answer

Your answer