Did you know? DZone has great portals for Python, Cloud, NoSQL, and HTML5!

Pete Brown is a Senior Program Manager with Microsoft on the developer community team led by Scott Hanselman, as well as a former Microsoft Silverlight MVP, INETA speaker, and RIA Architect for Applied Information Sciences, where he worked for over 13 years. Pete's focus at Microsoft is the community around client application development (WPF, Silverlight, Windows Phone, Surface, Windows Forms, C++, Native Windows API and more). Pete’s site is http://10rem.net Pete has posted 23 posts at DZone. View Full User Profile

Creating Sound using MediaStreamSource in Silverlight 3 Beta

03.24.2009
Email
Views: 5483
  • submit to reddit

Creating sound from raw bits is, believe it or not, slightly more involved than creating video from raw bits in Silverlight 3.

Getting samples to Silverlight is an interesting task. Your code will respond to Silverlight’s request for a sample with a buffer of samples of a size you determine. Rather than a push model, like generating video, it is a pull model.

Let’s start with the thing that defines our sample stream: the WaveFormatEx structure

WaveFormatEx

In order to create sound, you’ll need to be able to populate the WaveFormatEx structure, and serialize that out to a binhex-style string. Luckily, there’s code out there to do that already (see attached project below)

Once you have that class in your project, you’ll need to populate the members:

_waveFormat = new WaveFormatEx();
_waveFormat.BitsPerSample = 16;
_waveFormat.AvgBytesPerSec = (int)ByteRate;
_waveFormat.Channels = ChannelCount;
_waveFormat.BlockAlign = ChannelCount * (BitsPerSample / 8);
_waveFormat.ext = null; // ??
_waveFormat.FormatTag = WaveFormatEx.FormatPCM;
_waveFormat.SamplesPerSec = SampleRate;
_waveFormat.Size = 0; // must be zero

_waveFormat.ValidateWaveFormat();
BitsPerSample This is going to be 16, for 16 bit (two byte) samples
AvgBytesPerSec SampleRate * ChannelCount * BitsPerSample / 8
Channels The number of channels you have. Typically this is 1 for mono, 2 for stereo
BlockAlign Channels * (BitsPerSample / 8)
ext No idea what this is, but it needs to be null
FormatTag PCM or IEEE format. I’ve only used PCM. In fact, in the WaveFormatEx example, if you use anything other than PCM, it will throw an error when you try and validate the data
SamplesPerSec The number of samples per second. Usually this is something like 44100 for CD quality.
Size must be 0

Describing your Stream

The next thing you need to do, is set up a few dictionaries full of options for the stream. You typically do this in the OpenMediaAsync method:

_startPosition = _currentPosition = 0;

// Init
Dictionary<MediaStreamAttributeKeys, string> streamAttributes =
new Dictionary<MediaStreamAttributeKeys, string>();
Dictionary<MediaSourceAttributesKeys, string> sourceAttributes =
new Dictionary<MediaSourceAttributesKeys, string>();
List<MediaStreamDescription> availableStreams =
new List<MediaStreamDescription>();

// Stream Description and WaveFormatEx
streamAttributes[MediaStreamAttributeKeys.CodecPrivateData] =
_waveFormat.ToHexString(); // wfx
MediaStreamDescription msd =
new MediaStreamDescription(MediaStreamType.Audio,
streamAttributes);
_audioDesc = msd;

// next, add the description so that Silverlight will
// actually request samples for it
availableStreams.Add(_audioDesc);

// Tell silverlight we have an endless stream
sourceAttributes[MediaSourceAttributesKeys.Duration] =
TimeSpan.FromMinutes(0).Ticks.ToString(
CultureInfo.InvariantCulture);

// we don't support seeking on our stream
sourceAttributes[MediaSourceAttributesKeys.CanSeek] =
false.ToString();

// tell Silverlight we're done opening our media
ReportOpenMediaCompleted(sourceAttributes, availableStreams);

Reporting Samples

Next, we need to handle sample requests. In this example, I’m going to return a stereo noise sample, generated by creating random samples at each sample point. Since we have two channels, the effect will be in stereo. I do this in GetSampleAsync

int numSamples = ChannelCount * 256;
int bufferByteCount = BitsPerSample / 8 * numSamples;

// fill the stream with noise
for (int i = 0; i < numSamples; i++)
{
short sample = (short)_random.Next(
short.MinValue, short.MaxValue);

_stream.Write(BitConverter.GetBytes(sample),
0,
sizeof(short));
}


// Send out the next sample
MediaStreamSample msSamp = new MediaStreamSample(
_audioDesc,
_stream,
_currentPosition,
bufferByteCount,
_currentTimeStamp,
_emptySampleDict);

// Move our timestamp and position forward
_currentTimeStamp += _waveFormat.AudioDurationFromBufferSize(
(uint)bufferByteCount);
_currentPosition += bufferByteCount;

ReportGetSampleCompleted(msSamp);

The number of bytes you buffer will depend on what you can get away with. Ideally, you want a buffer equal to only one sample per channel for that call. In reality, you can’t get to that even with dedicated professional audio gear and on-the-metal sound generation. So experiment with some buffer sizes, and keep in mind that the more work you do in code, the larger your buffer will likely need to be. This is because you’ll likely be filling an internal sound buffer on a background thread and Silverlight will be pulling from that buffer on its own thread. You want to make sure Silverlight never gets ahead of you, but also that you don’t get more than about 10ms ahead of the actual audio output (10ms is the smallest delay/difference a human ear can typically discern)

[detour]

FWIW, this is the sound card I use for my pro-audio (click to see the specs):

EMU 1616 PCI Digital Audio System

And for things other than code projects like what I’m doing here, this is my setup:

http://www.flickr.com/photos/psychlist1972/3313289838/

Pete's EX-5, MT32 and SH-32

I’ve been playing with synthesizers since I was a teenager (my first real one was a Roland HS-60 (a Juno 106 in disguise), followed by an Alpha Juno and a Korg DW-6000. During the late 80s and early 90s in high school and college, I worked at a music store so I was also able to play around with lots of cool Roland and Korg synthesizers, plus some fun old analog beaters that often came in on trade.

[/detour]

Note that we use some helper functions from WaveFormatEx in this call in order to set the time stamp for this sample set. Note also that I use the built-in BitConverter to get the two bytes from the 16 bit sample. BitConverter.GetBytes returns an array sized to contained the bytes in a given variable of a type. Finally, ntoice the _emptySampleDict. That is, as it is named, an empty dictionary of MediaSampleAttributeKeys/strings:

// you only need sample attributes for video
private Dictionary<MediaSampleAttributeKeys, string> _emptySampleDict =
new Dictionary<MediaSampleAttributeKeys, string>();

To round it out, here are the other private variables in this example:

private WaveFormatEx _waveFormat;
private MediaStreamDescription _audioDesc;
private long _currentPosition;
private long _startPosition;
private long _currentTimeStamp;

private const int SampleRate = 44100;
private const int ChannelCount = 2;
private const int BitsPerSample = 16;
private const int ByteRate =
SampleRate * ChannelCount * BitsPerSample / 8;

private MemoryStream _stream;
private Random _random = new Random();

Other Functions

There are other functions you need to implement, even if you just throw an error or report them completed:

protected override void SeekAsync(long seekToTime)
{
ReportSeekCompleted(seekToTime);
}

protected override void SwitchMediaStreamAsync(
MediaStreamDescription mediaStreamDescription)
{
throw new NotImplementedException();
}

protected override void CloseMedia()
{
// Close the stream
_startPosition = _currentPosition = 0;
_audioDesc = null;
}

protected override void GetDiagnosticAsync(
MediaStreamSourceDiagnosticKind diagnosticKind)
{
throw new NotImplementedException();
}

Wiring up in Xaml

The next step is to wire our MediaStreamSource to a MediaElement in Silverlight

<MediaElement x:Name="TestMediaElement" AutoPlay="True" />
TestMediaElement.SetSource(new MyMediaStreamSource());

One word of caution. The current beta bits have a delay from the time a first sample is requested until you first hear audio. No amount of configuration on your part is going to change that, so don’t bother playing with buffering time settings. The team knows this is an issue, so I hope to see a solution at RTW.

More Complex Uses

You can certainly take this much further. For example, I built a basic synthesizer using MediaStreamSource. The synthesizer has multiple oscillators, each of which generates samples which are all mixed together and then output as a single two-channel stereo stream. You can try out an unstable/buggy build of the synthesizer here:

http://www.irritatedvowel.com/Silverlight/sl3/Synth/Default.html

(try the arpeggiator on the top right of the keyboard)

There are a number of bugs in my code for that synthesizer including issues with distortion and getting all out of whack, so if you hit an issue, first lower the volume. If that doesn’t take care of it, refresh the browser and try again. Here’s a screenshot of the synthesizer in action:

Source

The source code for the example in this post may be downloaded here.

Enjoy!

 

References
AttachmentSize
PeteBrownSilverlightSound.zip1.03 MB
Published at DZone with permission of its author, Pete Brown. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Mark Kestenbaum replied on Wed, 2009/09/16 - 9:53am

Great post and it took me a long way to getting my project to work. I do have a problem in that I'm using PCM 8 Bit audio encoded in a-law. When I pass the format type as "PCM", I can hear the audio but it sounds staticy. Trying any other form of FormatType gives an invalid file type error. Any ideas? The solution was to convert the a-law into straight 16 bit PCM which now sounds fine.

Mark Kestenbaum replied on Wed, 2009/09/16 - 9:55am

I have another problem. For some reason, when using the above method, the mediaelement does not return a correct value for "Position". (It moves 1 second about every 7 seconds). Checking the mediaelement I also noticed that the NaturalDuration is not set, nor is there a way to access the MediaStreamSource. All of the above is to get a textbox to display the number of seconds which have passed as the media plays. Since the Position is incorrect, I have no way to get this value. Any ideas?

Mark Kestenbaum replied on Tue, 2009/09/22 - 2:20am

I have no idea if anyone is reading these comments, but just in case, I'd like to add the following to my previous post: The problem appears to be connected to the timestamp parameter of the MediaStreamSample constructor. If I multiply that value by about 6.15, the Position property is more or less accurate. I say "more or less" because it starts out less than it should be, but eventually grows to be accurate. I have no explanation for this behavior which seems very strange to me. Again, I'd appreciate any comments from anyone who's had similar experiences.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.