Here is a small update. Together with a friend I dug into this a little more and I was able to analyse what is being sent. Consider this script:
EXEC sp_execute_external_script
@language =N'myR',
@script=N'
x <- "räksmörgås"
print(x)
print(Encoding(x))'
("räksmörgås" is Swedish for "shrimp sandwich".)
SSMS prints this:
STDOUT message(s) from external script:
[1] " ��räksmörgås ��"
[1] " ��latin1 ��"
The Swedish word has been encoded as UTF-8, but the string is then interpreted as Latin-1.
I wrote a small Perl script to run the batch above. This permitted me to capture the actual output and then analyse the bytes. For the second line, I got these bytes for the first couple of characters:
5b 31 5d 20 22 02 fd fd 72 c3 92 c2 a4 6b
[ 1 ] SP " ý ý r à ƒ  ¤ k
So it seems that this is UTF-8 encoded string that has been interpreted as Latin-1 and then been re-encoded into UTF-8 a second time. SSMS expects the string to be UTF-8, so it displays only one layer of UTF-8 conversion. However, the sequence fd-fd is not legal UTF-8, and therefore SSMS displays the rhombus with the question mark. The character is known as REPLACEMENT CHARACTER, and tells is that there is an encoding error.
The full sequence of mysterious characters is 02-fd-fd, but why it appears here, I don't know.
So this does not solve the problem, but at least it gives an idea of what is going on.