@영주 홍 The first result in the nBest array is the best result, no matter what the confidence value it is. This is by design and you cannot limit the response when using the detailed format to remove any of the results. You can refer the FAQ of speech service which documents this behavior.
With respect to the behavior of speech studio we will get back after checking internally, because this could be a implementation bug or a design to display the best based on other settings.
I think WER data is shown if audio_human labeled data is used as per the documentation.