หมายเหตุ
การเข้าถึงหน้านี้ต้องได้รับการอนุญาต คุณสามารถลอง ลงชื่อเข้าใช้หรือเปลี่ยนไดเรกทอรีได้
การเข้าถึงหน้านี้ต้องได้รับการอนุญาต คุณสามารถลองเปลี่ยนไดเรกทอรีได้
The Microsoft Audio Stack (MAS) is a set of audio processing enhancements optimized for speech processing scenarios such as keyword recognition and speech recognition. The Speech SDK integrates MAS, allowing any application or product to use its audio processing capabilities on input audio.
Audio processing pipelines
The Microsoft Audio Stack provides two audio processing pipelines, each optimized for different scenarios:
DSP-based pipeline (default)
The default pipeline (AUDIO_INPUT_PROCESSING_ENABLE_DEFAULT) uses traditional digital signal processing (DSP) algorithms and provides a full set of enhancements: beamforming, dereverberation, acoustic echo cancellation, automatic gain control, and noise suppression. You can disable individual enhancements to match your scenario. This pipeline supports all microphone array geometries and is available on Windows and Linux.
For details on DSP enhancements and code samples, see DSP-based audio processing with the Microsoft Audio Stack.
Model-based echo cancellation pipeline
The model-based pipeline (AUDIO_INPUT_PROCESSING_ENABLE_V2) replaces the DSP-based echo canceller with a machine learning model for improved echo suppression. This pipeline focuses specifically on acoustic echo cancellation and is designed for scenarios where echo suppression quality is critical.
For details and code samples, see Model-based echo cancellation with the Microsoft Audio Stack.
Pipeline comparison
Audio enhancements
| Feature | DSP-based (default) | Model-based (V2) |
|---|---|---|
| Acoustic echo cancellation | ✔ | ✔✔ |
| Noise suppression | ✔ | ✘ |
| Dereverberation | ✔ | ✘ |
| Automatic gain control | ✔ | ✘ |
| Beamforming | ✔ | ✘ |
| Disable individual enhancements | ✔ | ✘ |
✔✔ = ML-enhanced ✔ = Supported ✘ = Not supported
Platform and language support
| Feature | DSP-based (default) | Model-based (V2) |
|---|---|---|
| Windows x64 | ✔ | ✔ |
| Windows ARM64 | ✔ | ✔ |
| Linux | ✔ | ✘ |
| C++ | ✔ | ✔ |
| C# | ✔ | ✔ |
| Java | ✔ | ✘ |
Speech SDK integration
Both pipelines are available through the Speech SDK's AudioProcessingOptions class. Key capabilities include:
- Real-time microphone input and file input - Audio processing can be applied to real-time microphone input, streams, and file-based input.
- Speaker reference channel - A speaker reference channel can be specified for echo cancellation, using the
SpeakerReferenceChannel.LastChanneloption.
Privacy and data handling
Processing is performed fully locally where the Speech SDK is being used. No audio data is streamed to Microsoft's cloud services for processing by the Microsoft Audio Stack. The only exception is the Conversation Transcription Service, where raw audio is sent to Microsoft's cloud services for processing.