The problem related to Speech to text implementation on LAVIE Tab 10FD3 tablet

nguyen thuan 0 Reputation points
2024-06-21T14:44:11.54+00:00

We are implementing Azure Speech to Text in our app and tested it on several devices, but we had problems with the LAVIE Tab 10FD3 tablet (https://www.nec-lavie.jp/products/tablet/lavie/laviet10/) .

The Google Speech to Text API works fine, so it does not seem to be a microphone or hardware problem. Is there any improvement to be made?

Here is the code on Kotlin that we implemented:


// init speechRecognizer to get input from microphone
private val speechRecognizer: SpeechRecognizer by lazy {
        speechConfig = SpeechConfig.fromSubscription(SPEECH_SUBSCRIPTION_KEY, SPEECH_REGION)
        destroyMicrophoneStream() // in case it was previously initialized
        microphoneStream = MicrophoneStream()

        SpeechRecognizer(
            speechConfig,
            AudioConfig.fromStreamInput(MicrophoneStream.create()),
        )
    }
	
	
// start record the user voice 
// audio record here to detect if the user doesn't speak anything exceed specific seconds, we will stop the recording.
private fun startRecording() {
        if (ActivityCompat.checkSelfPermission(
                this,
                Manifest.permission.RECORD_AUDIO,
            ) != PackageManager.PERMISSION_GRANTED
        ) {
            return
        }
        audioRecord = AudioRecord(
            MediaRecorder.AudioSource.MIC,
            44100,
            AudioFormat.CHANNEL_IN_MONO,
            AudioFormat.ENCODING_PCM_16BIT,
            bufferSize,
        )

        audioRecord.startRecording()
        speechRecognizer.recognized.addEventListener(eventHandler)

        speechRecognizer.startContinuousRecognitionAsync().get()
        isRecording = true

        val handler = Handler(Looper.getMainLooper())
        val buffer = ShortArray(bufferSize)

        handler.post(object : Runnable {
            var lastVoiceTimestamp = System.currentTimeMillis()

            override fun run() {
                if (!isRecording) return

                val read = audioRecord.read(buffer, 0, buffer.size)
                var sum = 0.0

                for (i in 0 until read) {
                    sum += buffer[i] * buffer[i].toDouble()
                }

                val rms = sqrt(sum / read)

                if (rms > silenceThreshold) {
                    lastVoiceTimestamp = System.currentTimeMillis()
                }

                if (System.currentTimeMillis() - lastVoiceTimestamp > silenceTimeout) {
                    stopRecording()
                } else {
                    handler.postDelayed(this, 100)
                }
            }
        })
    }

Here is the video problem: https://drive.google.com/file/d/1WV_Rkwc5WzcNQh9sYgT6954im2b0scOg/view?usp=sharing

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,546 questions
{count} votes