https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/1793
One way to solve this issue is to programmatically mute the microphone while the Text-to-Speech (TTS) is speaking. This approach helps to prevent the Speech-to-Text (STT) from picking up the TTS output. So, instead of switching the recognition off and on, you can mute the microphone temporarily until the TTS has finished speaking.
Another recommendation that was suggested is to use echo cancellation. However, this feature has limited support with Microsoft Audio Stack (MAS) in the SDK, and Python doesn't have MAS support, so this might not be a practical solution for you.
Finally, if you're experiencing a delay when stopping and restarting continuous recognition, it might be related to an issue with the SDK. While pausing the continuous recognition might seem like a good solution, it's noted that using stop_continuous_recognition and then restarting with start_continuous_recognition does not work very well, as it has a significant overhead and can cause a delay.