Share via


How to: Perform Streaming Transformations of Text to XML

One approach to processing a text file is to write an extension method that streams the text file a line at a time using the yield return construct. You then can write a LINQ query that processes the text file in a lazy deferred fashion. If you then use XStreamingElement to stream output, you then can create a transformation from the text file to XML that uses a minimal amount of memory, regardless of the size of the source text file.

There are some caveats regarding streaming transformations. A streaming transformation is best applied in situations where you can process the entire file once, and if you can process the lines in the order that they occur in the source document. If you have to process the file more than once, or if you have to sort the lines before you can process them, you will lose many of the benefits of using a streaming technique.

Example

The following text file, People.txt, is the source for this example.

#This is a comment
1,Tai,Yee,Writer
2,Nikolay,Grachev,Programmer
3,David,Wright,Inventor

The following code contains an extension method that streams the lines of the text file in a deferred fashion.

Note

The following example uses the yield return construct of C#. Equivalent code is provided in Visual Basic using a class that implements the IEnumerable(Of XElement) interface. For an example of implement IEnumerable(Of T) in Visual Basic, see Walkthrough: Implementing IEnumerable(Of T) in Visual Basic.

public static class StreamReaderSequence
{
    public static IEnumerable<string> Lines(this StreamReader source)
    {
        String line;

        if (source == null)
            throw new ArgumentNullException("source");
        while ((line = source.ReadLine()) != null)
        {
            yield return line;
        }
    }
}

class Program
{
    static void Main(string[] args)
    {
        StreamReader sr = new StreamReader("People.txt");
        XStreamingElement xmlTree = new XStreamingElement("Root",
            from line in sr.Lines()
            let items = line.Split(',')
            where !line.StartsWith("#")
            select new XElement("Person",
                       new XAttribute("ID", items[0]),
                       new XElement("First", items[1]),
                       new XElement("Last", items[2]),
                       new XElement("Occupation", items[3])
                   )
        );
        Console.WriteLine(xmlTree);
        sr.Close();
    }
}
Module Module1
    Sub Main()
        Dim sr = New IO.StreamReader("..\..\People.txt")
        Dim xmlTree = New XStreamingElement("Root",
            From line In sr.Lines()
            Let items = Split(line, ",")
            Where Not line.StartsWith("#")
            Select <Person ID=<%= items(0) %>>
                       <First><%= items(1) %></First>
                       <Last><%= items(2) %></Last>
                       <Occupation><%= items(3) %></Occupation>
                   </Person>
                   )

        Console.WriteLine(xmlTree)
        sr.Close()
    End Sub
End Module

Module StreamReaderSequence
    <System.Runtime.CompilerServices.Extension()>
    Public Function Lines(ByRef source As IO.StreamReader) As IEnumerable(Of String)
        If source Is Nothing Then Throw New ArgumentNullException("source")
        Return New StreamReaderEnumerable(source)
    End Function
End Module


Public Class StreamReaderEnumerable
    Implements IEnumerable(Of String)

    Private _source As IO.StreamReader

    Public Sub New(ByVal source As IO.StreamReader)
        _source = source
    End Sub

    Public Function GetEnumerator() As Generic.IEnumerator(Of String) Implements IEnumerable(Of String).GetEnumerator
        Return New StreamReaderEnumerator(_source)
    End Function

    Public Function GetEnumerator1() As IEnumerator Implements IEnumerable.GetEnumerator
        Return Me.GetEnumerator()
    End Function
End Class

Public Class StreamReaderEnumerator
    Implements IEnumerator(Of String)

    Private _current As String
    Private _source As IO.StreamReader

    Public Sub New(ByVal source As IO.StreamReader)
        _source = source
    End Sub


    Public ReadOnly Property Current As String Implements Generic.IEnumerator(Of String).Current
        Get
            Return _current
        End Get
    End Property

    Public ReadOnly Property Current1 As Object Implements IEnumerator.Current
        Get
            Return Me.Current
        End Get
    End Property

    Public Function MoveNext() As Boolean Implements IEnumerator.MoveNext
        _current = _source.ReadLine()
        Return If(_current IsNot Nothing, True, False)
    End Function

    Public Sub Reset() Implements IEnumerator.Reset
        _current = Nothing
        _source.DiscardBufferedData()
        _source.BaseStream.Seek(0, IO.SeekOrigin.Begin)
    End Sub


    Public Sub Dispose() Implements IDisposable.Dispose

    End Sub

End Class

This example produces the following output:

<Root>
  <Person ID="1">
    <First>Tai</First>
    <Last>Yee</Last>
    <Occupation>Writer</Occupation>
  </Person>
  <Person ID="2">
    <First>Nikolay</First>
    <Last>Grachev</Last>
    <Occupation>Programmer</Occupation>
  </Person>
  <Person ID="3">
    <First>David</First>
    <Last>Wright</Last>
    <Occupation>Inventor</Occupation>
  </Person>
</Root>

See Also

Reference

XStreamingElement

Concepts

Advanced Query Techniques (LINQ to XML)