Upraviť

Zdieľať cez


How to customize character encoding with System.Text.Json

By default, the serializer escapes all non-ASCII characters. That is, it replaces them with \uxxxx where xxxx is the Unicode code of the character. For example, if the Summary property in the following JSON is set to Cyrillic жарко, the WeatherForecast object is serialized as shown in this example:

{
  "Date": "2019-08-01T00:00:00-07:00",
  "TemperatureCelsius": 25,
  "Summary": "\u0436\u0430\u0440\u043A\u043E"
}

Serialize language character sets

To serialize the character sets of one or more languages without escaping, specify Unicode ranges when creating an instance of System.Text.Encodings.Web.JavaScriptEncoder, as shown in the following example:

using System.Text.Encodings.Web;
using System.Text.Json;
using System.Text.Unicode;
Imports System.Text.Encodings.Web
Imports System.Text.Json
Imports System.Text.Unicode
var options1 = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.Create(UnicodeRanges.BasicLatin, UnicodeRanges.Cyrillic),
    WriteIndented = true
};
jsonString = JsonSerializer.Serialize(weatherForecast, options1);
options = New JsonSerializerOptions With {
    .Encoder = JavaScriptEncoder.Create(UnicodeRanges.BasicLatin, UnicodeRanges.Cyrillic),
    .WriteIndented = True
}
jsonString = JsonSerializer.Serialize(weatherForecast1, options)

This code doesn't escape Cyrillic or Greek characters. If the Summary property is set to Cyrillic жарко, the WeatherForecast object is serialized as shown in this example:

{
  "Date": "2019-08-01T00:00:00-07:00",
  "TemperatureCelsius": 25,
  "Summary": "жарко"
}

By default, the encoder is initialized with the BasicLatin range.

To serialize all language sets without escaping, use UnicodeRanges.All.

Serialize specific characters

An alternative is to specify individual characters that you want to allow through without being escaped. The following example serializes only the first two characters of жарко:

using System.Text.Encodings.Web;
using System.Text.Json;
using System.Text.Unicode;
Imports System.Text.Encodings.Web
Imports System.Text.Json
Imports System.Text.Unicode
var encoderSettings = new TextEncoderSettings();
encoderSettings.AllowCharacters('\u0436', '\u0430');
encoderSettings.AllowRange(UnicodeRanges.BasicLatin);
var options2 = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.Create(encoderSettings),
    WriteIndented = true
};
jsonString = JsonSerializer.Serialize(weatherForecast, options2);
Dim encoderSettings As TextEncoderSettings = New TextEncoderSettings
encoderSettings.AllowCharacters(ChrW(&H436), ChrW(&H430))
encoderSettings.AllowRange(UnicodeRanges.BasicLatin)
options = New JsonSerializerOptions With {
    .Encoder = JavaScriptEncoder.Create(encoderSettings),
    .WriteIndented = True
}
jsonString = JsonSerializer.Serialize(weatherForecast1, options)

Here's an example of JSON produced by the preceding code:

{
  "Date": "2019-08-01T00:00:00-07:00",
  "TemperatureCelsius": 25,
  "Summary": "жа\u0440\u043A\u043E"
}

Block lists

The preceding sections show how to specify allow lists of code points or ranges that you don't want to be escaped. However, there are global and encoder-specific block lists that can override certain code points in your allow list. Code points in a block list are always escaped, even if they're included in your allow list.

Global block list

The global block list includes things like private-use characters, control characters, undefined code points, and certain Unicode categories, such as the Space_Separator category, excluding U+0020 SPACE. For example, U+3000 IDEOGRAPHIC SPACE is escaped even if you specify Unicode range CJK Symbols and Punctuation (U+3000-U+303F) as your allow list.

The global block list is an implementation detail that has changed in every release of .NET. Don't take a dependency on a character being a member of (or not being a member of) the global block list.

Encoder-specific block lists

Examples of encoder-specific blocked code points include '<' and '&' for the HTML encoder, '\' for the JSON encoder, and '%' for the URL encoder. For example, the HTML encoder always escapes ampersands ('&'), even though the ampersand is in the BasicLatin range and all the encoders are initialized with BasicLatin by default.

Serialize all characters

To minimize escaping, you can use JavaScriptEncoder.UnsafeRelaxedJsonEscaping, as shown in the following example:

using System.Text.Encodings.Web;
using System.Text.Json;
using System.Text.Unicode;
Imports System.Text.Encodings.Web
Imports System.Text.Json
Imports System.Text.Unicode
var options3 = new JsonSerializerOptions
{
    Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
    WriteIndented = true
};
jsonString = JsonSerializer.Serialize(weatherForecast, options3);
options = New JsonSerializerOptions With {
    .Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
    .WriteIndented = True
}
jsonString = JsonSerializer.Serialize(weatherForecast1, options)

Caution

Compared to the default encoder, the UnsafeRelaxedJsonEscaping encoder is more permissive about allowing characters to pass through unescaped:

  • It doesn't escape HTML-sensitive characters such as <, >, &, and '.
  • It doesn't offer any additional defense-in-depth protections against XSS or information disclosure attacks, such as those which might result from the client and server disagreeing on the charset.

Use the unsafe encoder only when it's known that the client will be interpreting the resulting payload as UTF-8 encoded JSON. For example, you can use it if the server is sending the response header Content-Type: application/json; charset=utf-8. Never allow the raw UnsafeRelaxedJsonEscaping output to be emitted into an HTML page or a <script> element.

See also