SQL Server and custom R: question marks added to output

Olga Larina 26 Reputation points
2021-09-14T07:07:11.667+00:00

After R update according to tutorial https://learn.microsoft.com/en-us/sql/machine-learning/install/custom-runtime-r?view=sql-server-ver15&pivots=platform-windows
Query:

EXEC sp_execute_external_script
@language =N'myR',
@script=N'
print("Hello RExtension!");'

Output:
STDOUT message(s) from external script:
[1] " ��Hello RExtension! ��"

Completion time: 2021-09-14T09:05:00.2416329+02:00

It seems to be encoding problem. Does anyone have tips on solving? Thank you!

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,808 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Seeya Xi-MSFT 16,446 Reputation points
    2021-09-14T09:39:25.5+00:00

    Hi @Olga Larina ,

    You can try this setting:
    https://joehanna.com/sql-server/changing-the-default-encoding-of-sql-files-in-ssms/

    Best regards,
    Seeya


    If the response is helpful, please click "Accept Answer" and upvote it, as this could help other community members looking for similar queries.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  2. Erland Sommarskog 101.8K Reputation points MVP
    2021-09-18T12:57:36.037+00:00

    Here is a small update. Together with a friend I dug into this a little more and I was able to analyse what is being sent. Consider this script:

       EXEC sp_execute_external_script  
              @language =N'myR',  
              @script=N'  
              x <- "räksmörgås"  
              print(x)  
              print(Encoding(x))'  
    

    ("räksmörgås" is Swedish for "shrimp sandwich".)

    SSMS prints this:

       STDOUT message(s) from external script:   
       [1] " ��räksmörgås ��"  
       [1] " ��latin1 ��"  
    

    The Swedish word has been encoded as UTF-8, but the string is then interpreted as Latin-1.

    I wrote a small Perl script to run the batch above. This permitted me to capture the actual output and then analyse the bytes. For the second line, I got these bytes for the first couple of characters:

       5b 31 5d 20 22 02 fd fd 72 c3 92 c2 a4 6b   
        [  1  ] SP  "     ý  ý  r  Ã  ƒ  Â  ¤  k  
    

    So it seems that this is UTF-8 encoded string that has been interpreted as Latin-1 and then been re-encoded into UTF-8 a second time. SSMS expects the string to be UTF-8, so it displays only one layer of UTF-8 conversion. However, the sequence fd-fd is not legal UTF-8, and therefore SSMS displays the rhombus with the question mark. The character is known as REPLACEMENT CHARACTER, and tells is that there is an encoding error.

    The full sequence of mysterious characters is 02-fd-fd, but why it appears here, I don't know.

    So this does not solve the problem, but at least it gives an idea of what is going on.