February 2010

Volume 25 Number 02

UI Frontiers - Sound Generation in WPF Applications

By Charles Petzold | February 2010

A few weeks ago I sat in a new Toyota Prius while the agent at the rental car company explained the unfamiliar controls and indicators arrayed on the dashboard. “Wow,” I thought. “Even for a technology as old as the automobile, manufacturers are continually refining the user interface.”

In the broadest sense, the user interface is the place where human and machine interact. While the concept is as old as technology itself, the user interface really blossomed as an art form only with the personal computer revolution.

Just a tiny fraction of today’s personal computer users can remember the days before the advent of the graphical user interfaces of the Apple Macintosh and Microsoft Windows. At the time (the mid- to late 1980s), some pundits feared that standardization of the user interface would impose an oppressive uniformity over applications. That was not the case. Instead, as the availability of standard controls freed designers and programmers from the need to reinvent the scrollbar, user interfaces actually began to evolve and become much more interesting.

In this respect, the new paradigms introduced by Windows Presentation Foundation (WPF) have allowed user interfaces to get even fancier. WPF lays down a strong foundation of retained-mode graphics, animation and 3-D. It adds to that a tree-based hierarchical structure of parent and child elements and a powerful markup language known as XAML. The result is unprecedented flexibility in customizing existing controls through templating, and building new controls by assembling existing components.

But these new concepts aren’t just for client programming. A healthy subset of the Microsoft .NET Framework, XAML and WPF classes have become available in Web-based programming through Silverlight. The day has already arrived when you can actually share custom controls between client applications and Web applications. I’m sure this trend will continue into mobile applications and eventually encompass many different types of information and entertainment systems, while taking advantage of new technologies such as multi-touch.

For these reasons I am convinced that the user interface has become an even more crucial part of application programming. This column will explore the potential of user-interface design in WPF and Silverlight, including the use of cross-platform code when possible.

Sounding Off

It’s not always possible to differentiate between good and bad user-interface choices right away. Clippy—the anthropomorphized paperclip that debuted in Microsoft Office 97—probably seemed like a good idea at the time. For that reason, I’ll focus more on the technological potential rather than design. I’ll tend to avoid the term “best practices.” That’s a matter for history and the market.

For example, a good case could be made that computers should not make noise except when they’re playing a video or a sound file in response to a specific command from the user. I’m going to ignore that stricture and show you how to play custom sounds in a WPF application by generating waveform data at runtime.

This sound-making capability isn’t an official part of the .NET Framework yet, but it’s made possible by the NAudio library available on Codeplex (naudio.codeplex.com). Following links from that site, you can check out Mark Heath’s blog for some sample code, and Sebastian Gray’s site tutorials.

You can use the NAudio library in Windows Forms or WPF applications. Because it accesses Win32 API functions through PInvoke, it can’t be used with Silverlight.

For this article, I used NAudio version 1.3.8. When you create a project that uses NAudio, you’ll want to compile for 32-bit processing. Go to the Build tab of the Properties page and select x86 from the Platform Target dropdown.

Although the library provides many features for specialized applications that need to use sound, I’m going to show you a technique that might find its way into a more general-purpose application.

Suppose, for example, your application allows the user to drag objects around the window, and you want this dragging to be accompanied by a simple sound (a sine wave, say) that increases in frequency the further the object gets from the center of the window.

This is a job for waveform audio.

Almost all PCs these days include sound-generation hardware, often implemented with a chip or two right on the motherboard. This hardware is usually not much more than a pair of digital-to-analog converters (DACs). Deliver a constant stream of integers describing a waveform to the two DACs, and stereo sound comes out.

How much data is involved? Applications these days commonly generate “CD quality” sound. The sampling rate is a constant 44,100 samples per second. (The Nyquist Theorem states that the sampling rate needs to be at least twice the highest frequency to be reproduced. Humans are commonly said to hear sounds with frequencies between 20Hz and 20,000Hz, so 44,100 is comfortably adequate.) Each sample is a signed 16-bit integer, a size that implies a signal-to-noise ratio of 96 decibels.

Making Waves

The Win32 API provides access to the sound-generation hardware through a collection of functions beginning with the words waveOut. The NAudio library encapsulates those functions in a WaveOut class that takes care of the Win32 interoperability and hides much of the messiness as well.

WaveOut requires a class that you provide that implements the IWaveProvider interface, which means the class defines a gettable property of type WaveFormat that (at the very least) indicates the sample rate and the number of channels. The class also defines a method named Read. The arguments to the Read method include a byte-array buffer that the class is required to fill with waveform data. With default settings, this Read method will be called 10 times a second. Fall behind a little in getting this buffer filled and you’ll hear unaesthetic gaps in the sound and ugly static.

NAudio provides a couple of abstract classes that implement IWaveProvider and make things a little easier for common 
audio jobs. The WaveProvider16 class implements an abstract Read method that lets you fill the buffer with shorts rather than bytes, so you don’t have to break the samples in half.

Figure 1 shows a simple SineWaveOscillator class that derives from WaveProvider16. The constructor allows specifying a sampling rate, but calls the base class constructor with a second argument indicating one channel for monaural sound.

Figure 1 A Class to Generate Sine Wave Samples for NAudio

class SineWaveOscillator : WaveProvider16 {
  double phaseAngle;

  public SineWaveOscillator(int sampleRate): 
    base(sampleRate, 1) {
  }

  public double Frequency { set; get; }
  public short Amplitude { set; get; }

  public override int Read(short[] buffer, int offset, 
    int sampleCount) {

    for (int index = 0; index < sampleCount; index++) {
      buffer[offset + index] = 
        (short)(Amplitude * Math.Sin(phaseAngle));
      phaseAngle += 
        2 * Math.PI * Frequency / WaveFormat.SampleRate;

      if (phaseAngle > 2 * Math.PI)
        phaseAngle -= 2 * Math.PI;
    }
    return sampleCount;
  }
}

SineWaveOscillator defines two properties named Frequency (of type double) and Amplitude (a short). The program maintains a field named phaseAngle that always ranges between 0 and 2π. For each sample, the phaseAngle is passed to the Math.Sin function, and then increased by a value called the phase angle increment, which is a simple calculation involving the frequency and the sampling rate.

(If you’re going to be generating many waveforms simultaneously, you’ll want to optimize processing speed by using integer arithmetic whenever possible, even to the extent of implementing a sine wave table as an array of shorts. But for simple uses of waveform audio, floating point calculations are fine.)

To use SineWaveOscillator in a program, you’ll need a reference to the NAudio.dll library and a using directive:

using NAudio.Wave;

Here’s some code that starts playing a sound.

WaveOut waveOut = new WaveOut();
SineWaveOscillator osc = new SineWaveOscillator(44100);
osc.Frequency = 440;
osc.Amplitude = 8192;
waveOut.Init(osc);
waveOut.Play();

Here the Frequency property is initialized to 440Hz. In musical circles, that’s the A above middle C, and is often used as a pitch standard and for tuning purposes. Of course, as the sound is playing, the Frequency property can be changed. To turn off the sound, the Amplitude could be set to 0, but the SineWaveOscillator will continue receiving calls to the Play method. To stop those calls, call Stop on the WaveOut object. When you don’t need the WaveOut object any more, you should call Dispose on it to properly release resources.

Off Key

When I used SineWaveOscillator in my sample program, it didn’t do what I wanted. I wanted a sound to accompany objects dragged around the window, and I wanted the frequency of that sound to be based on the distance of the object from the center. But as I moved my objects, the frequency transitions weren’t smooth. I was getting a bumpy glissando (such as fingers are swept across the keys of a piano or the strings of a harp), whereas what I wanted was a smooth portamento (like a trombone or the opening clarinet of Gershwin’s “Rhapsody in Blue”).

The problem is that each call to the Play method from WaveOut causes an entire buffer to be filled based on the same frequency 
value. During the time that the Play method is filling the buffer, the frequency can’t change in response to the user dragging the mouse because Play is executing on the user-interface thread.

So how bad is this problem, and how large are these buffers?

The WaveOut class in NAudio includes a DesiredLatency property that, by default, is set to 300 milliseconds. It also includes a NumberOfBuffers property set to 3. (Multiple buffers help throughput because the API can be reading a buffer while an application is filling another.) Hence, each buffer is equivalent to .1 second of samples. Through experimentation, I discovered that it is not possible to decrease the DesiredLatency significantly without causing 
audible gaps. It is possible to increase the number of buffers—be sure to select a value so that the buffer size in bytes is a multiple of 4—but this didn’t seem to help significantly. It’s also possible to have the Play method run on a secondary thread by passing the static method call WaveCallbackInfo.FunctionCallback to the WaveOut constructor, but that didn’t help much either.

It soon became obvious that what I needed was an oscillator that itself performed the portamento while filling the buffer. Instead of SineWaveOscillator, I needed a PortamentoSineWaveOscillator.

PortamentoSineWaveOscillator

I wanted to make other changes as well. Human perception of frequency is logarithmic. The octave is defined as a doubling of frequency, and octaves are audibly similar across the spectrum. To the human nervous system, the difference between 100Hz and 200Hz is the same as the difference between 1000Hz and 2000Hz. In music, each octave comprises 12 audibly equal steps called semitones. Hence, the frequencies of these semitones increase sequentially by a multiplicative factor equal to the twelfth root of two.

I wanted my portamento to be logarithmic as well, so in PortamentoSineWaveOscillator I defined a new property named Pitch that calculates frequency like this:

Frequency = 440 * Math.Pow(2, (Pitch - 69) / 12)

This is a fairly standard formula that comes from conventions used in the Musical Instrument Digital Interface (MIDI), which I’ll discuss in a future column. If you number all the notes of the piano from the bottom to the top where Middle C is assigned a Pitch value of 60, then the A above Middle C is 69, and the formula determines the frequency to be 440Hz. In MIDI these Pitch values are integers, but in the PortamentoSineWaveOscillator class, Pitch is a double, so gradations between notes are possible.

In PortamentoSineWaveOscillator, the Play method detects when Pitch has changed and then gradually changes the value used to calculate the frequency (and hence the phase angle increment) based on the remaining size of the buffer. The logic allows Pitch to change while the method is executing, but that will only happen if Play is executing on a secondary thread.

As the AudibleDragging program in the code download demonstrates, it worked! The program creates seven little blocks of different colors near the center of the window. When you grab them with the mouse, the program creates a WaveOut object using PortamentoSineWaveOscillator. As the object is dragged, the program simply determines a distance from the center of the window, and sets the pitch of the oscillator based on the following formula:

60 + 12 * distance / 200;

In other words, Middle C plus one octave for every 200 units in distance. AudibleDragging is a silly little program, of course, and it may convince you more than ever that applications should forever be silent. But the potential of generating custom sounds at runtime is simply too powerful to be rejected categorically.

Play On

Of course, you’re not limited to single sine-wave oscillators. You can also derive a mixer from WaveProvider16, and use that to combine several oscillators. You can combine simple waveforms into more complex ones. The use of a Pitch property suggests an easy approach to specifying musical notes.

But if it’s music and musical instruments you want your application to blast from the speakers, you’ll be pleased to know that NAudio also includes classes that let you generate MIDI messages from your Windows Forms or WPF applications. I’ll show you how to do that soon.


Charles Petzold is a longtime contributing editor to MSDN Magazine*. His most recent book is “The Annotated Turing: A Guided Tour through Alan Turing’s Historic Paper on Computability and the Turing Machine” (Wiley, 2008). Petzold blogs on his Web site charlespetzold.com.*

Thanks to the following technical expert for reviewing this article: Mark Heath