Encoding.GetPreamble Method

Definition

When overridden in a derived class, returns a sequence of bytes that specifies the encoding used.

public virtual byte[] GetPreamble ();

Returns

Byte[]

A byte array containing a sequence of bytes that specifies the encoding used.

-or-

A byte array of length zero, if a preamble is not required.

Examples

The following example determines the byte order of the encoding based on the preamble.

using System;
using System.Text;

namespace GetPreambleExample
{
   class GetPreambleExampleClass
   {
      static void Main()
      {
         Encoding unicode = Encoding.Unicode;

         // Get the preamble for the Unicode encoder. 
         // In this case the preamble contains the byte order mark (BOM).
         byte[] preamble = unicode.GetPreamble();

         // Make sure a preamble was returned 
         // and is large enough to contain a BOM.
         if(preamble.Length >= 2)
         {
            if(preamble[0] == 0xFE && preamble[1] == 0xFF)
            {
               Console.WriteLine("The Unicode encoder is encoding in big-endian order.");
            }
            else if(preamble[0] == 0xFF && preamble[1] == 0xFE)
            {
               Console.WriteLine("The Unicode encoder is encoding in little-endian order.");
            }
         }
      }
   }
}

/*
This code produces the following output.

The Unicode encoder is encoding in little-endian order.

*/

Remarks

Optionally, the Encoding object provides a preamble that is an array of bytes that can be prefixed to the sequence of bytes resulting from the encoding process. If the preamble contains a byte order mark (in Unicode, code point U+FEFF), it helps the decoder determine the byte order and the transformation format or UTF.

The Unicode byte order mark (BOM) is serialized as follows (in hexadecimal):

  • UTF-8: EF BB BF

  • UTF-16 big endian byte order: FE FF

  • UTF-16 little endian byte order: FF FE

  • UTF-32 big endian byte order: 00 00 FE FF

  • UTF-32 little endian byte order: FF FE 00 00

You should use the BOM, because it provides nearly certain identification of an encoding for files that otherwise have lost reference to the Encoding object, for example, untagged or improperly tagged web data or random text files stored when a business did not have international concerns or other data. Often user problems might be avoided if data is consistently and properly tagged, preferably in UTF-8 or UTF-16.

For standards that provide an encoding type, a BOM is somewhat redundant. However, it can be used to help a server send the correct encoding header. Alternatively, it can be used as a fallback in case the encoding is otherwise lost.

There are some disadvantages to using a BOM. For example, knowing how to limit the database fields that use a BOM can be difficult. Concatenation of files can be a problem also, for example, when files are merged in such a way that an unnecessary character can end up in the middle of data. In spite of the few disadvantages, however, the use of a BOM is highly recommended.

For more information on byte order and the byte order mark, see The Unicode Standard at the Unicode home page.

Caution

To ensure that the encoded bytes are decoded properly, you should prefix encoded bytes with a preamble. However, most encodings do not provide a preamble. To ensure that the encoded bytes are decoded properly, you should use a Unicode encoding, that is, UTF8Encoding, UnicodeEncoding, or UTF32Encoding, with a preamble.

Applies to

Product Versions
.NET Core 1.0, Core 1.1, Core 2.0, Core 2.1, Core 2.2, Core 3.0, Core 3.1, 5, 6, 7, 8, 9
.NET Framework 1.1, 2.0, 3.0, 3.5, 4.0, 4.5, 4.5.1, 4.5.2, 4.6, 4.6.1, 4.6.2, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1
.NET Standard 1.0, 1.1, 1.2, 1.3, 1.4, 1.6, 2.0, 2.1
UWP 10.0