Converting Hex String To Corresponding Byte Array Using C#

I came across an issue where I needed to convert a string representing HEX value into corresponding byte array. I know that there are various solutions that all accomplish the same task: Take A String And Convert It To Equivalent Byte Array.

For example a string value: 0x0123456789ABCDEF would be 0x01, 0x23, 0x45, 0x67, 0x89, 0xAB, 0xCD and 0xEF represented in a byte array. The basic .NET functions would give you the byte array of the actualy ASCII characters, and not the binary value that they represent. There was all kind of solutions out there. Some declared to be faster than other, or simply the fastest. I am sure the solution that I came up with is not prettiest or the most effcient. But, it is my solution that I came up with. I am sure that it has been thought before but have not seen any post that are quite a like it. I think that it is pretty good solution for what I wanted to accomplish. That said. I love learning. So, if you find a problem or have an idea to improve it. Please! All feedback is welcomed: The Good, The Bad and The Ugly.

The Problem

Convert a string it to a byte array representing the actual text values of each byte.

Considerations

I spent some time thinking about different things that I would have to take into consideration when doing the conversion of the string to the byte array.

Special Cases: Input is valid, and expected but it creates special handling conditions.

  1. Is the input string null or empty
  2. The string starts with the HEX indicator string '0x'.
  3. Is the leading '0' dropped from the string.
  4. The casing of the letter, they can be upper, lower or mixed casing.

Error Checking: We live in a World where we can't trust any, not even our input string. So, I need to have some type of validation in place for the input.

  1. Does the string contain characters that are not valid alphanumeric values: 0-9 and A-F.

Approach

Taking into account the factors that I came up with, I came up with the following approach.

To handle the case of null or empty input, I would return an empty array. To avoid, unnecessary allocations I created a static, readonly empty byte array that I would return to the caller.

 private static readonly byte[] Empty = new byte[0];

Once I have determined that I actually have some type of string. I would handle rest of conditions during the conversion, except the 'dropped' leading zero ('0') which I would handle before the actual conversion.

For the actual conversion I decided to take a forked solution: reading 2 characters at a time. The upper and lower nibble, which I would the convert individual to the equivalent byte value. This would be the time to validate that the received character is actually an alphanumeric character.

My Solution

Now to the fun part. Coding. I would break my solution into 2 distinct functions: ConvertToByteArray and FromCharacterToByte.

ConvertToByteArray: This function would be the public function exposed to the caller. It would be responsible for doing basic input validation, handle the items 1, 2 and 3 from the special cases that I had identified as well as constructing the byte array to be returned to the caller.

         public static byte[] ConvertToByteArray(string value)
        {
            byte[] bytes = null;
            if (String.IsNullOrEmpty(value))
                bytes = Empty;
            else
            {
                int string_length = value.Length;
                int character_index = (value.StartsWith("0x", StringComparison.Ordinal)) ? 2 : 0; // Does the string define leading HEX indicator '0x'. Adjust starting index accordingly.               
                int number_of_characters = string_length - character_index;

                bool add_leading_zero = false;
                if (0 != (number_of_characters % 2))
                {
                    add_leading_zero = true;

                    number_of_characters += 1;  // Leading '0' has been striped from the string presentation.
                }

                bytes = new byte[number_of_characters / 2]; // Initialize our byte array to hold the converted string.

                int write_index = 0;
                if (add_leading_zero)
                {
                    bytes[write_index++] = FromCharacterToByte(value[character_index], character_index);
                    character_index += 1;
                }

                for (int read_index = character_index; read_index < value.Length; read_index += 2)
                {
                    byte upper = FromCharacterToByte(value[read_index], read_index, 4);
                    byte lower = FromCharacterToByte(value[read_index + 1], read_index + 1);

                    bytes[write_index++] = (byte)(upper | lower);
                }
            }

            return bytes;
        }

FromCharacterToByte: This function would be the private work horse function. The main responsiblity would be the convert the character in hand to the equivalent byte value. In case of non-numeric value, this function would handle the casing and validation of the character. Now, to reuse this code the function would have to take into account the nibble order and shift the value accordingly.

         private static byte FromCharacterToByte(char character, int index, int shift = 0)
        {
            byte value = (byte)character;
            if (((0x40 < value) && (0x47 > value)) || ((0x60 < value) && (0x67 > value)))
            {
                if (0x40 == (0x40 & value))
                {
                    if (0x20 == (0x20 & value))
                        value = (byte)(((value + 0xA) - 0x61) << shift);
                    else
                        value = (byte)(((value + 0xA) - 0x41) << shift);
                }
            }
            else if ((0x29 < value) && (0x40 > value))
                value = (byte)((value - 0x30) << shift);
            else
                throw new InvalidOperationException(String.Format("Character '{0}' at index '{1}' is not valid alphanumeric character.", character, index));

            return value;
        }

Now, I understand that this function take a simple string as input. For now this will do the trick and/if the time comes for me to read the value from a file. Support for stream is easy to add.

Usage

The calling this function is very simple. I created a class named 'StringConverter' to hold them.

 byte[] bytes = StringConverter.ConvertToByteArray("0x0123456789AbCdEf");

Performance

As always, I am curious how it works with large string. I created a text file and repeated the value '0123456789AbCdEf' over and over and over again. So many times that I ended up with a 18.5MB file. Due to the lack of support for reading a stream, I simply load the file and read the whole string into memory using StreamReader. Then simply pass it to the function.

I was actually suprised with the performance results that I received.

Including the reading the content of the file, the conversion completed in just about 500ms (0.5 seconds). Not bad.

Now, if I moved the read outside the timer code. Read the massive string into a variable variable and passed it into the conversion function. The total time taken dropped to consistently to around 190ms (0.19 seconds).

Conclussion

The results are suprising, and not too bad for about an hours worth of work. Converting a 18.5MB string into equivalent byte array using .NET in less than 200ms.