Web Audio API overview (Part 1 of 2)

In the next 2 blog posts I'll be showing you some essential features of the Web Audio API. You can find its specification here. The Web Audio API provides nearly all the functionality of a normal synthesizer, one of the reasons why it is so powerful.

Anyway let's get going. I'm going to talk you through the source of a little audio visualizer in this post and the next one. Here is a demo of the final product. In this first post I'll concentrate on a simplified version.

Warning: The code shown in this blog post is outdated and will only work in older version of Chrome.

Now let's look at the HTML body structure. In this case it is very simple, it has a main #container, that (as the name says) contains a canvas and an audio element. After that there are a couple of script tags.


<body>
  <div id="container">
    <canvas height="200" width="500" id="fft"></canvas>
    <audio id="audio" src="IO2010.mp3" preload controls></audio>
  </div>
  <script>
  // requestAnim shim layer by Paul Irish
    window.requestAnimFrame = (function(){
      return  window.requestAnimationFrame       ||
              window.webkitRequestAnimationFrame ||
              window.mozRequestAnimationFrame    ||
              window.oRequestAnimationFrame      ||
              window.msRequestAnimationFrame     ||
              function(callback, element){
                window.setTimeout(callback, 1000 / 60);
              };
    })();
  </script>
  <script>
    // Some Javascript
  </script>
</body>

You may already know what the contents of the first script tag are for. It is a shim by Paul Irish that makes it easier to use the requestAnimationFrame() (To get more info read his blog post). I'm not going to go further into how it works, but all it does really is make user friendlier animations.

The important parts for this blog post are the contents of the 2nd script tag:

// The audio element
audioElement = document.getElementById('audio');

// The canvas, its context and fillstyle
canvas = document.getElementById('fft');
ctx = canvas.getContext('2d');
ctx.fillStyle = "white";

// Create new Audio Context and an audio analyzer
audioContext = new webkitAudioContext();
analyser = audioContext.createAnalyser();

// Canvas' height and width
CANVAS_HEIGHT = canvas.height;
CANVAS_WIDTH = canvas.width;
// We'll need the offset later
OFFSET = 100;
// Spacing between the individual bars
SPACING = 10;
// Initialize and start drawing
// when the audio starts playing
window.onload = init;
audioElement.addEventListener('play', draw);

function init() {
  // Take input from audioElement
  source = audioContext.createMediaElementSource(audioElement);
  // Connect the stream to an analyzer
  source.connect(analyser);
  // Connect the analyzer to the speakers
  analyser.connect(audioContext.destination);
  // Start the animation
  draw();
}

function draw() {
  // See http://paulirish.com/2011/requestanimationframe-for-smart-animating/
  requestAnimFrame(draw, canvas);
  // New typed array for the raw frequency data
  freqData = new Uint8Array(analyser.frequencyBinCount);
  // Put the raw frequency into the newly created array
  analyser.getByteFrequencyData(freqData);
  // Clear the canvas
  ctx.clearRect(0, 0, CANVAS_WIDTH, CANVAS_HEIGHT);
  // This loop draws all the bars
  for (var i = 0; i < freqData.length - OFFSET; i++) {
    // Work out the hight of the current bar
    // by getting the current frequency
    var magnitude = freqData[i + OFFSET];
    // Draw a bar from the bottom up (cause for the "-magnitude")
    ctx.fillRect(i * SPACING, CANVAS_HEIGHT, SPACING / 2, -magnitude);
  };
}

The webkitAudioContext() created, is a context in which the audio interactions take place. Similar to the ones mentioned in my previous post.

You could divide it into 2 main parts:

The setup

sets up all the variables (except "freqData") and provides information on how each bar drawn should look. The init function connects the source of the audio (the audio element) with the analyzer and connects the analyzer with the destination. i.e. the speakers

You can imagine connecting up source, analyzer and destination as taking a few plugs and plugging them in some hardware. The only difference is that this is virtual!

The animation's draw() function

takes care of drawing the bars.

But what does that mean exactly?

  1. Call the requestAnimFrame function to restart the animation at just the right time.

  2. Create a typed array (called freqData) for holding the individual frequencies. The parameter passed at creation is the size of the array (In this case 1024 items).

  3. Call a function on the analyzer to put the frequencies in freqData (it doesn't return anything).

  4. Simply clear the canvas.

  5. Loop through all the frequency data (except that in the offset) and each time:

    • Get the current frequency

    • Draw a bar that is as high as the frequency. The magnitude has to be negative here so that the bar is drawn in the correct direction.

  6. Lather. Rinse. Repeat.

Nearly everything that has to do with the Web Audio API inherits from an object called AudioNode, which contains some basic structure for working with audio. For example the analyzer we are using here is also inherited from AudioNode. Other examples of Audio Nodes are BiquadFilter, LowPassFilter, AudioGainNode and many more. I will be covering some of them in part 2 of this mini series.