How to customize character encoding with System.Text.Json
By default, the serializer escapes all non-ASCII characters. That is, it replaces them with \uxxxx
where xxxx
is the Unicode code of the character. For example, if the Summary
property in the following JSON is set to Cyrillic жарко
, the WeatherForecast
object is serialized as shown in this example:
{
"Date": "2019-08-01T00:00:00-07:00",
"TemperatureCelsius": 25,
"Summary": "\u0436\u0430\u0440\u043A\u043E"
}
Serialize language character sets
To serialize the character sets of one or more languages without escaping, specify Unicode ranges when creating an instance of System.Text.Encodings.Web.JavaScriptEncoder, as shown in the following example:
using System.Text.Encodings.Web;
using System.Text.Json;
using System.Text.Unicode;
Imports System.Text.Encodings.Web
Imports System.Text.Json
Imports System.Text.Unicode
var options1 = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.Create(UnicodeRanges.BasicLatin, UnicodeRanges.Cyrillic),
WriteIndented = true
};
jsonString = JsonSerializer.Serialize(weatherForecast, options1);
options = New JsonSerializerOptions With {
.Encoder = JavaScriptEncoder.Create(UnicodeRanges.BasicLatin, UnicodeRanges.Cyrillic),
.WriteIndented = True
}
jsonString = JsonSerializer.Serialize(weatherForecast1, options)
This code doesn't escape Cyrillic or Greek characters. If the Summary
property is set to Cyrillic жарко
, the WeatherForecast
object is serialized as shown in this example:
{
"Date": "2019-08-01T00:00:00-07:00",
"TemperatureCelsius": 25,
"Summary": "жарко"
}
By default, the encoder is initialized with the BasicLatin range.
To serialize all language sets without escaping, use UnicodeRanges.All.
Serialize specific characters
An alternative is to specify individual characters that you want to allow through without being escaped. The following example serializes only the first two characters of жарко
:
using System.Text.Encodings.Web;
using System.Text.Json;
using System.Text.Unicode;
Imports System.Text.Encodings.Web
Imports System.Text.Json
Imports System.Text.Unicode
var encoderSettings = new TextEncoderSettings();
encoderSettings.AllowCharacters('\u0436', '\u0430');
encoderSettings.AllowRange(UnicodeRanges.BasicLatin);
var options2 = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.Create(encoderSettings),
WriteIndented = true
};
jsonString = JsonSerializer.Serialize(weatherForecast, options2);
Dim encoderSettings As TextEncoderSettings = New TextEncoderSettings
encoderSettings.AllowCharacters(ChrW(&H436), ChrW(&H430))
encoderSettings.AllowRange(UnicodeRanges.BasicLatin)
options = New JsonSerializerOptions With {
.Encoder = JavaScriptEncoder.Create(encoderSettings),
.WriteIndented = True
}
jsonString = JsonSerializer.Serialize(weatherForecast1, options)
Here's an example of JSON produced by the preceding code:
{
"Date": "2019-08-01T00:00:00-07:00",
"TemperatureCelsius": 25,
"Summary": "жа\u0440\u043A\u043E"
}
Block lists
The preceding sections show how to specify allow lists of code points or ranges that you don't want to be escaped. However, there are global and encoder-specific block lists that can override certain code points in your allow list. Code points in a block list are always escaped, even if they're included in your allow list.
Global block list
The global block list includes things like private-use characters, control characters, undefined code points, and certain Unicode categories, such as the Space_Separator category, excluding U+0020 SPACE
. For example, U+3000 IDEOGRAPHIC SPACE
is escaped even if you specify Unicode range CJK Symbols and Punctuation (U+3000-U+303F) as your allow list.
The global block list is an implementation detail that has changed in every release of .NET. Don't take a dependency on a character being a member of (or not being a member of) the global block list.
Encoder-specific block lists
Examples of encoder-specific blocked code points include '<'
and '&'
for the HTML encoder, '\'
for the JSON encoder, and '%'
for the URL encoder. For example, the HTML encoder always escapes ampersands ('&'
), even though the ampersand is in the BasicLatin
range and all the encoders are initialized with BasicLatin
by default.
Serialize all characters
To minimize escaping, you can use JavaScriptEncoder.UnsafeRelaxedJsonEscaping, as shown in the following example:
using System.Text.Encodings.Web;
using System.Text.Json;
using System.Text.Unicode;
Imports System.Text.Encodings.Web
Imports System.Text.Json
Imports System.Text.Unicode
var options3 = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
WriteIndented = true
};
jsonString = JsonSerializer.Serialize(weatherForecast, options3);
options = New JsonSerializerOptions With {
.Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
.WriteIndented = True
}
jsonString = JsonSerializer.Serialize(weatherForecast1, options)
Caution
Compared to the default encoder, the UnsafeRelaxedJsonEscaping
encoder is more permissive about allowing characters to pass through unescaped:
- It doesn't escape HTML-sensitive characters such as
<
,>
,&
, and'
. - It doesn't offer any additional defense-in-depth protections against XSS or information disclosure attacks, such as those which might result from the client and server disagreeing on the charset.
Use the unsafe encoder only when it's known that the client will be interpreting the resulting payload as UTF-8 encoded JSON. For example, you can use it if the server is sending the response header Content-Type: application/json; charset=utf-8
. Never allow the raw UnsafeRelaxedJsonEscaping
output to be emitted into an HTML page or a <script>
element.