Share via

July 2013

Volume 28 Number 7

DirectX Factor - Simulating an Analog Synthesizer

By Charles Petzold

Charles PetzoldAbout 50 years ago, a physicist and engineer named Robert Moog created an electronic music synthesizer with a rather unusual feature: an organ-type keyboard. Some composers of electronic music disparaged such a prosaic and old-fashioned control device, while other composers—and particularly performers—welcomed this development. By the end of the 1960s, Wendy Carlos’ Switched-On Bach had become one of the best-selling classical albums of all time, and the Moog synthesizer had entered the mainstream.

The early Moog synthesizers were modular and programmed with patch cables. In 1970, however, the Minimoog was released—small, easy to use and play, and priced at just $1,495. (A good history of these early synthesizers is the book, “Analog Days: The Invention and Impact of the Moog Synthesizer” [Harvard University Press, 2004], by Trevor Pinch and Frank Trocco.)

We classify the Moog and similar synthesizers as “analog” devices because they create sounds using varying voltages generated from circuitry built from transistors, resistors and capacitors. In contrast, more modern “digital” synthesizers create sound through algorithmic computations or digitized samples. Older devices are further classified as “subtractive” synthesizers: Rather than building a composite sound through the combination of sine waves (a technique called additive synthesis), subtractive synthesizers begin with a waveform rich in harmonics—such as a sawtooth or square wave—and then run it through filters to eliminate some harmonics and alter the timbre of the sound.

A crucial concept pioneered by Robert Moog was “voltage control.” Consider an oscillator, which is the component of a synthesizer that generates a basic audio waveform of some sort. In earlier synthesizers, the frequency of this waveform might be controlled by the value of a resistor somewhere in the circuitry, and that variable resistor might be controlled by a dial. But in a voltage-controlled oscillator (VCO), the frequency of the oscillator is governed by an input voltage. For example, each increase of one volt to the oscillator might double the oscillator’s frequency. In this way, the frequency of the VCO can be controlled by a keyboard that generates a voltage that increases by one volt per octave.

In analog synthesizers, the output from one or more VCOs goes into a voltage-controlled filter (VCF) for altering the harmonic content of the waveform. Input voltages to the VCF control the filter’s cutoff frequency, or the sharpness of the filter’s response (the filter’s quality, or Q). The output from the VCF then goes into a voltage-controlled amplifier (VCA), the gain of which is controlled by another voltage.

Envelope Generators

But once you start talking about VCFs and VCAs, things get complicated, and a little background is necessary.

Back in the 19th century, some scientists (most notably Hermann von Helmholtz) began making significant inroads into the exploration of both the physics and perception of sound. Characteristics of sound such as frequency and loudness turned out to be relatively simple compared with the knotty problem of timbre—that quality of sound that allows us to distinguish a piano from a violin or trombone. It was hypothesized (and somewhat demonstrated) that timbre was related to the sound’s harmonic content, which is the various degrees of intensity of the sine curves that constitute the sound.

But when 20th century researchers began investigating further, they discovered it wasn’t this simple. Harmonic content changes over the course of a musical tone, and this contributes to the instrument’s timbre. In particular, the very beginning of a note from a musical instrument is crucial to auditory perception. When a piano hammer or violin bow first touches a string, or vibrating air is propelled into a metal or wooden tube, very complex harmonic activity occurs. This complexity decreases very quickly, but without it, musical tones sound dull and far less interesting and distinctive.

To mimic the complexity of real musical tones, a synthesizer can’t simply turn a note on and off like a switch. (To hear what such a simple synthesizer sounds like, check out the ChromaticButtonKeyboard program in the February 2013 installment of this column at At the onset of each note, the sound must have a brief “blip” of high volume and varying timbre before stabilizing. When the note ends, the sound must not simply stop, but die out with a decrease in volume and complexity.

For loudness, there’s a general pattern to this process: For a note played on a string, brass or woodwind instrument, the sound rises to a maximum loudness quickly, then dies off a bit and holds steady. When the note ends, it rapidly decreases in volume. These two phases are known as the “attack” and “release.”

For more percussive instruments—including the piano—the note reaches maximum volume quickly during the attack but then dies off slowly if the instrument remains undamped, for example, while holding the piano key down. Once the key is released, the note quickly dies off.

To achieve these effects, synthesizers implement something called an “envelope generator.” Figure 1 shows a fairly standard example called an attack-decay-sustain-release (ADSR) envelope. The horizontal axis is time, and the vertical axis is loudness.

An Attack-Decay-Sustain-Release Envelope
Figure 1 An Attack-Decay-Sustain-Release Envelope

When a key on a keyboard is pressed and the note begins to sound, you hear the attack and decay sections that give a burst of sound at the outset, and then the note stabilizes at the sustain level. When the key is released and the note ends, the release section occurs. For a piano-type sound, the decay time could be a couple seconds, and the sustain level is set at zero so the sound continues to decay as long as the key is held down.

Even the simplest analog synthesizers have two ADSR envelopes: One controls the volume and the other controls the filter. This is usually a low-pass filter. As a note is struck, the cutoff frequency is rapidly increased to allow more high-frequency harmonics through, and then the cutoff frequency decreases somewhat. Emphasized a lot, this creates the distinctive analog synthesizer chirping sound.

The AnalogSynth Project

Some nine months ago, as I was contemplating using XAudio2 to program a digital simulation of a small 1970s-era analog synthesizer, I realized that the envelope generators would be one of the more challenging aspects of the job. It wasn’t even clear to me whether these envelope generators would be external to the audio-processing stream (and therefore access the SetVolume and SetFilterParameters methods of an XAudio2 voice), or somehow be built in to the audio stream.

I eventually settled on implementing the envelopes as XAudio2 audio effects—more formally known as Audio Processing Objects (APOs). This means the envelope logic works directly on the audio stream. I became more confident with this approach after coding filter logic that duplicates the digital biquad filters built into XAudio2. By using my own filter code, I thought I might be able to change the filter algorithm in the future without major disruptions to the program structure.

Figure 2 shows the screen of the resultant Analog­Synth program, whose source code you can download at Although I was influenced by the layout of the controls on the Minimoog, I kept the actual UI rather simple, using, for example, sliders rather than dials. Most of my focus was on the internals.

The AnalogSynth Screen
Figure 2 The AnalogSynth Screen

The keyboard is a series of custom Key controls processing Pointer events and grouped into Octave controls. The keyboard is actually six octaves in width and can be scrolled horizontally using the thick gray stripe below the keys. A red dot identifies middle C.

The program can play 10 simultaneous notes, but that’s changeable with a simple #define in MainPage.xaml.cs. (Early analog synthesizers like the Minimoog were monophonic.) Each of these 10 voices is an instance of a class I called SynthVoice. SynthVoice has methods to set all the various parameters of the voice (including frequency, volume and envelopes), as well as methods named Trigger and Release to indicate when a key has been pressed or released.

The Minimoog achieved its characteristic “punchy” sound in part by having two oscillators running in parallel and often slightly mistuned, either deliberately or as a result of the frequency drift common in analog circuitry.

For that reason, each SynthVoice creates two instances of an Oscillator class, which are controlled from the upper-left of the control panel shown in Figure 2. The control panel lets you set the waveform and relative volume for these two oscillators, and you can transpose the frequency by one or two octaves up or down. In addition, you can offset the frequency of the second oscillator by up to half an octave.

Each Oscillator instance creates an IXAudio2SourceVoice object, and exposes methods named SetFrequency, SetAmplitude and SetWaveform. SynthVoice routes the two IXAudio2SourceVoice outputs to an IXAudio2SubmixVoice, and then instantiates two custom audio effects called FilterEnvelopeEffect and Amplitude­EnvelopeEffect that it applies to this submix voice. These two effects share a class called EnvelopeGenerator that I’ll describe shortly.

Figure 3 shows the organization of components in each SynthVoice. For the 10 SynthVoice objects, there are a total of 20 IXAudio2Source­Voice instances going into 10 IXAudio2SubmixVoice instances, which are then routed to a single IXAudio2MasteringVoice. I use a sampling rate of 48,000 Hz and 32-bit floating-point samples throughout.

The Structure of the SynthVoice Class
Figure 3 The Structure of the SynthVoice Class

The user controls the filter from the center section of the control panel. A ToggleButton allows the filter to be bypassed; otherwise, the Cutoff frequency is relative to the note that’s being played. (In other words, the cutoff frequency of the filter tracks the keyboard.) The Emph­asis slider controls the filter’s Q setting. The Envelope slider controls the degree to which the envelope affects the filter cutoff frequency.

The four sliders associated with the filter envelope and the loudness envelope work similarly. The Attack, Decay and Release sliders are all durations from 10 milliseconds to 10 seconds in a logarithmic scale. The sliders have tool-tip value converters to display the duration associated with the settings.

AnalogSynth makes no volume adjustments for the 20 potential simultaneous IXAudio2SourceVoice instances, or to counteract the tendency of digital biquad filters to amplify audio near the cutoff frequency. Consequently, AnalogSynth makes it easy to overload the audio. To help the user avoid this, the program uses the XAudio2­CreateVolumeMeter function to create an audio effect that monitors the outgoing sound. If the green dot in the upper-right corner changes to red, output audio is being clipped and you should use the slider at the far right to decrease the volume.

Early synthesizers used patch cords to connect components. As a result of this legacy, a particular synthesizer setup is still known as a “patch.” If you find a patch that makes a sound you want to keep, press the Save button and assign a name. Press the Load button to get a list of previously saved patches and select one. These patches (as well as the current setup) are stored in the local settings area.

The Envelope Generator Algorithm

Code that implements an envelope generator is basically a state machine, with five sequential states that I called Dormant, Attack, Decay, Sustain and Release. From a UI perspective, it seems most natural to specify attack, decay, and sustain in terms of time dura­tions, but when actually performing the calculations you need to convert that to a rate—an increase or decrease of loudness (or filter cutoff frequency) per unit time. The two audio effects in AnalogSynth use these changing levels to implement the effect.

This state machine is not always as sequential as the diagram in Figure 1 would seem to imply. For example, what happens when a key is pressed and released so quickly that the envelope has not yet reached the sustain section when the key is released? At first I thought the envelope should be allowed to complete its attack and decay sections and then go right into the release section, but this did not work well for a piano-type envelope. In a piano envelope, the sustain level is zero and the decay time is relatively long. A key quickly pressed and released still had a long decay—as if it were not released at all!

I decided that for a quick press and release, I would let the attack section complete, but then immediately jump to the release section. This meant that the final rate of decrease would need to be calculated based on the current level. This explains why there’s a difference in how the release is handled in the structure for the envelope parameters, shown here:

struct EnvelopeGeneratorParameters
  float baseLevel;
  float attackRate;   // in level/msec
  float peakLevel;
  float decayRate;    // in level/msec
  float sustainLevel;
  float releaseTime;  // in msec

For the amplitude envelope, baseLevel is set to 0, peakLevel is set to 1 and sustainLevel is somewhere between those values. For the filter envelope, the three levels refer to a multiplier applied to the filter cutoff frequency: baseLevel is 1, and peakLevel is governed by the slider labeled “Envelope” and can range from 1 to 16. That frequency multiplier of 16 corresponds to four octaves.

Both AmplitudeEnvelopeEffect and FilterEnvelopeEffect share the EnvelopeGenerator class. Figure 4 shows the EnvelopeGenerator header file. Notice the public method to set the envelope parameters, and two public methods named Attack and Release that trigger the envelope to begin and finish up. These three methods should be called in that order. The code is not written to deal with an envelope whose parameters change midway through its progress.

Figure 4 The EnvelopeGenerator Header File

class EnvelopeGenerator
  enum class State
    Dormant, Attack, Decay, Sustain, Release
  EnvelopeGeneratorParameters params;
  float level;
  State state;
  bool isReleased;
  float releaseRate;
  void SetParameters(const EnvelopeGeneratorParameters params);
  void Attack();
  void Release();
  bool GetNextValue(float interval, float& value);

The current calculated value from the envelope generator is obtained through repeated calls to GetNextValue. The interval argu­ment is in milliseconds, and the method computes a new value based on that interval, possibly switching states in the process. When the envelope has completed with the Released section, GetNextValue returns true to indicate that the envelope has completed, but I don’t actually use that return value elsewhere in the program.

Figure 5 shows the implementation of the EnvelopeGenerator class. Near the top of the GetNextValue method is the code to skip directly to the Release state when a key is released, and the calculation of a release rate based on the current level and the release time.

Figure 5 The Implementation of EnvelopeGenerator

EnvelopeGenerator::EnvelopeGenerator() : state(State::Dormant)
  params.baseLevel = 0;
void EnvelopeGenerator::SetParameters(const EnvelopeGeneratorParameters params)
  this->params = params;
void EnvelopeGenerator::Attack()
  state = State::Attack;
  level = params.baseLevel;
  isReleased = false;
void EnvelopeGenerator::Release()
  isReleased = true;
bool EnvelopeGenerator::GetNextValue(float interval, float& value)
  bool completed = false;
  // If note is released, go directly to Release state,
  // except if still attacking
  if (isReleased &&
    (state == State::Decay || state == State::Sustain))
    state = State::Release;
    releaseRate = (params.baseLevel - level) / params.releaseTime;
  switch (state)
  case State::Dormant:
    level = params.baseLevel;
    completed = true;
  case State::Attack:
    level += interval * params.attackRate;
    if ((params.attackRate > 0 && level >= params.peakLevel) ||
      (params.attackRate < 0 && level <= params.peakLevel))
      level = params.peakLevel;
      state = State::Decay;
  case State::Decay:
    level += interval * params.decayRate;
    if ((params.decayRate > 0 && level >= params.sustainLevel) ||
      (params.decayRate < 0 && level <= params.sustainLevel))
      level = params.sustainLevel;
      state = State::Sustain;
  case State::Sustain:
  case State::Release:
    level += interval * releaseRate;
    if ((releaseRate > 0 && level >= params.baseLevel) ||
      (releaseRate < 0 && level <= params.baseLevel))
      level = params.baseLevel;
      state = State::Dormant;
      completed = true;
  value = level;
  return completed;

A Pair of Audio Effects

Both the AmplitudeEnvelopeEffect and FilterEnvelopeEffect classes derive from CXAPOParametersBase so they can accept parameters, and both classes also maintain an instance of the EnvelopeGenerator class for performing the envelope calculations. The parameter structures for these two audio effects are named AmplitudeEnvelopeParameters and FilterEnvelopeParameters.

The AmplitudeEnvelopeParameters structure is merely an EnvelopeGeneratorParameters structure and a Boolean keyPressed field that’s true when the key associated with this voice is pressed and false when it’s released. (The Filter­EnvelopeParameters structure is just a bit more complex because it needs to incorporate a base-level filter cutoff frequency and Q setting.) Both effects classes maintain their own keyPressed data members that can be compared with the parameters value to determine when the envelope attack or release state should be triggered.

You can see how this works in Figure 6, which shows the code for the Process override in AmplitudeEnvelopeEffect. If the effect is enabled and the local keyPressed value is false but the keyPressed value in the effect parameters is true, then the effect makes calls to the SetParameters and Attack methods of the EnvelopeGenerator instance. If the opposite is the case—the local keyPressed value is true but the one in the parameters is false—then the effect calls the Release method.

Figure 6 The Process Override in AmplitudeEnvelopeEffect

void AmplitudeEnvelopeEffect::Process(UINT32 inpParamCount,
    UINT32 outParamCount,
    BOOL isEnabled)
  // Get effect parameters
  AmplitudeEnvelopeParameters * pParams =
    reinterpret_cast<AmplitudeEnvelopeParameters *>
  // Get buffer pointers and other information
  const float * pSrc = static_cast<float const*>(pInpParam[0].pBuffer);
  float * pDst = static_cast<float *>(pOutParam[0].pBuffer);
  int frameCount = pInpParam[0].ValidFrameCount;
  int numChannels = waveFormat.nChannels;
    if (!isEnabled)
      for (int frame = 0; frame < frameCount; frame++)
        for (int channel = 0; channel < numChannels; channel++)
          int index = numChannels * frame + channel;
          pDst[index] = pSrc[index];
      // Key being pressed
      if (!this->keyPressed && pParams->keyPressed)
        this->keyPressed = true;
      // Key being released
      else if (this->keyPressed && !pParams->keyPressed)
        this->keyPressed = false;
      // Calculate interval in msec
      float interval = 1000.0f / waveFormat.nSamplesPerSec;
      for (int frame = 0; frame < frameCount; frame++)
        float volume;
        envelopeGenerator.GetNextValue(interval, volume);
        for (int channel = 0; channel < numChannels; channel++)
          int index = numChannels * frame + channel;
          pDst[index] = volume * pSrc[index];
  // Set output parameters
  pOutParam[0].ValidFrameCount = pInpParam[0].ValidFrameCount;
  pOutParam[0].BufferFlags = pInpParam[0].BufferFlags;

The effect could call the GetNextValue method of EnvelopeGenerator either for every Process call (in which case the interval argument would indicate 10 milliseconds) or for every sample (in which case the interval is more like 21 microseconds). Although the first approach should be adequate, I decided on the second for theoretically smoother transitions.

The floating-point volume value returned from the GetNextValue call ranges from 0 (when a note is first beginning or ending) to 1 for the culmination of the attack. The effect simply multiplies the floating-point samples by this number.

Now the Fun Begins

I’ve spent so much time coding the Analog­Synth program that I haven’t had much time to play around with it. It could very well be that some of the controls and parameters need some fine-tuning, or perhaps rather coarser tuning! In particular, long decay and release times on the volume don’t sound quite right, and they suggest that the envelope changes to amplitudes should be logarithmic rather than linear.

I’m also intrigued by the use of touch input with the on-screen keyboard. The keys on a real piano are sensitive to the velocity with which they are struck, and synthesizer keyboards have attempted to emulate that same feel. Most touchscreens, however, can’t detect touch velocity or pressure. But they can be made sensitive to slight finger movements on the screen, which is beyond the capability of a real keyboard. Can on-screen keyboards be made more responsive in this way? There’s only one way to find out!

Charles Petzold is a longtime contributor to MSDN Magazine and the author of “Programming Windows, 6th edition” (O’Reilly Media, 2012), a book about writing applications for Windows 8. His Web site is

Thanks to the following technical expert for reviewing this article: James McNellis (Microsoft)
James McNellis is a C++ aficionado and a software developer on the Visual C++ team at Microsoft, where he where he builds C++ libraries and maintains the C Runtime libraries (CRT).  He tweets at @JamesMcNellis.