.NET Zone is brought to you in partnership with:

Den is a DZone Zone Leader and has posted 460 posts at DZone. You can read more from them at their website. View Full User Profile

Kinect SDK and F# - a standard library in a non-standard language

06.21.2011
| 6208 views |
  • submit to reddit

F# is yet another programming language integrated in the .NET Framework that is probably a bit less popular than C#, C++ and VB.NET (all of them being a part of the platform). Nonetheless, it has a pretty solid userbase and it is using the same .NET capabilities as any other Microsoft managed programming language, the main difference being the fact that it implements the functional rather than object-oriented paradigm.

I decided to go ahead and learn more about this gem, and in the meantime do something useful. My main goal was to build some Kinect SDK templates for Visual Studio, but this also required me re-writing some fairly basic samples in a language I had no idea about.

First things first - a F# application is as raw as it can be. There are no pre-built classes and no visual designer. The development starts with a completely empty file. So I needed to add proper library references:

I decided to use WPF classes in the context of my templates (after all, I am handling video/depth/skeleton content), so I needed:

  • PresentationCore
  • PresentationFramework
  • WindowsBase
  • System.Xaml 

Microsoft.Research.Kinect is obviously the Kinect SDK library.

Time to define the file header:

#light

open System
open System.Windows
open System.Windows.Media.Imaging
open Microsoft.Research.Kinect.Nui

The #light directive determines the strictness of the code parser, making it a bit easier for unexperienced developers to go through the program structure (a big plus - dependency on direct indentation and spacing). open statements are nothing more than using (in C#) and Imports (in VB.NET).

Now I need to make sure that I have an actual window to show. A simple invocation of the Window class will do:

let window = new Window()
window.Width <- 800.0
window.Height <- 600.0
window.Title <- "Kinect Video Application"
window.Loaded.AddHandler(new RoutedEventHandler(WindowLoaded))
window.Unloaded.AddHandler(new RoutedEventHandler(WindowUnloaded))
window.Content <- grid

window.Show()

Notice that the way I am assigning values is a bit different compared to C# and VB.NET. Whenever I am instantiating new objects, I am using the equal sign. Whenever I am assigning exact values to specific properties, I am using the inverse arrow. 

The different approach continues when I am trying to add event handlers (via AddHandler instead of a direct += assignment).

Here is what my event handlers look like:

let WindowLoaded (sender : obj) (args: RoutedEventArgs) = 
    runtime.Initialize(RuntimeOptions.UseColor)
    runtime.VideoStream.Open(ImageStreamType.Video, 2, ImageResolution.Resolution640x480, ImageType.Color)
    runtime.VideoFrameReady.AddHandler(new EventHandler<ImageFrameReadyEventArgs>(VideoFrameReady))
 
let WindowUnloaded (sender : obj) (args: RoutedEventArgs) = 
    runtime.Uninitialize()  

runtime here is an instance of Microsoft.Research.Kinect.Nui.Runtime:

let runtime = new Runtime()

I am using the VideoStream and event handling in a pretty obvious way. Notice the way parameters are placed in event handlers - separate blocks with the name going first and the type - second, separated by a colon.

Now I need a grid and an Image control, where the obtained data will actually be displayed:

let grid = new System.Windows.Controls.Grid()
let winImage = new System.Windows.Controls.Image()
winImage.Height <- 480.0
winImage.Width <- 640.0

grid.Children.Add(winImage) |> ignore

The height and width should be passed as floats. But be aware, that F# is very sensitive to the fact of which float you are actually using - float (System.Double) or float32 (System.Single), so make sure you make the proper type conversion when passing values around.

I already initialized the runtime, so all I need to do now is handle the result once the video frame is ready:

let VideoFrameReady (sender : obj) (args: ImageFrameReadyEventArgs) = 
    let image = args.ImageFrame.Image
    let source = BitmapSource.Create(image.Width, image.Height, 96.0, 96.0, Media.PixelFormats.Bgr32, null, image.Bits, image.Width * image.BytesPerPixel)
    winImage.Source <- source 

With everything set up, I need to make sure that the application is launched correctly and I don't get a simpl console window (although that will be useful, as you will see later on). I need to invoke this:

[<STAThread()>]
do 
    let app = new Application() in
    app.Run(window) |> ignore

The STAThreadAttribute is required specifies the apartment state of the current thread to single-threaded. The application will ultimately fail to run without it. To learn more as to why you need this attribute, I highly recommend reading this article.

Now, when the application runs, a new window will be shown, along with the console window that will play the role of a debugger console:

The Depth detection application works in a similar manner - all I changed are the references to video, which are now replaced with Depth. Take a look at the full project:

// Learn more about F# at http://fsharp.net
#light

open System
open System.Windows
open System.Windows.Media.Imaging
open Microsoft.Research.Kinect.Nui

let runtime = new Runtime()
let grid = new System.Windows.Controls.Grid()
let winImage = new System.Windows.Controls.Image()
winImage.Height <- 480.0
winImage.Width <- 640.0

grid.Children.Add(winImage) |> ignore

//Video frame is ready to be processed.
let DepthFrameReady (sender : obj) (args: ImageFrameReadyEventArgs) = 
    let image = args.ImageFrame.Image
    let source = BitmapSource.Create(image.Width, image.Height, 96.0, 96.0, Media.PixelFormats.Gray16, null, image.Bits, image.Width * image.BytesPerPixel)
    winImage.Source <- source 

let WindowLoaded (sender : obj) (args: RoutedEventArgs) = 
    runtime.Initialize(RuntimeOptions.UseDepth)
    runtime.DepthStream.Open(ImageStreamType.Depth, 2, ImageResolution.Resolution320x240, ImageType.Depth)
    runtime.DepthFrameReady.AddHandler(new EventHandler<ImageFrameReadyEventArgs>(DepthFrameReady))
 
let WindowUnloaded (sender : obj) (args: RoutedEventArgs) = 
    runtime.Uninitialize()  

let window = new Window()
window.Width <- 800.0
window.Height <- 600.0
window.Title <- "Kinect Depth Application"
window.Loaded.AddHandler(new RoutedEventHandler(WindowLoaded))
window.Unloaded.AddHandler(new RoutedEventHandler(WindowUnloaded))
window.Content <- grid

window.Show()

[<STAThread()>]
do 
    let app = new Application() in
    app.Run(window) |> ignore

As simple as this. I will get a grayscale-based image as a result:

The situation becomes a bit more tricky when it comes to skeletal tracking, because of some internal transformations. Library and framework dependencies are still the same, but the program structure is a bit different.

I need to manually create the canvas and the Ellipse instances that will represent my hands and head on the screen:

//The main canvas that is handling the ellipses
let canvas = new System.Windows.Controls.Canvas()
canvas.Background <- System.Windows.Media.Brushes.Transparent

//Right hand ellipse
let rhEllipse = new System.Windows.Shapes.Ellipse()
rhEllipse.Height <- 20.0
rhEllipse.Width <- 20.0
rhEllipse.Fill <- System.Windows.Media.Brushes.Red
rhEllipse.Stroke <- System.Windows.Media.Brushes.White

//Left hand ellipse
let lhEllipse = new System.Windows.Shapes.Ellipse()
lhEllipse.Height <- 20.0
lhEllipse.Width <- 20.0
lhEllipse.Fill <- System.Windows.Media.Brushes.Red
lhEllipse.Stroke <- System.Windows.Media.Brushes.White

//Head ellipse
let hEllipse = new System.Windows.Shapes.Ellipse()
hEllipse.Height <- 20.0
hEllipse.Width <- 20.0
hEllipse.Fill <- System.Windows.Media.Brushes.Red
hEllipse.Stroke <- System.Windows.Media.Brushes.White 

canvas.Children.Add(rhEllipse) |> ignore
canvas.Children.Add(lhEllipse) |> ignore
canvas.Children.Add(hEllipse) |> ignore

Nothing overly complicated here, but for those who worked with WPF before, you probably will find it much easier to build the UI through XAML and not code-behind (remember that there is no visual designer for F# applications).

One very important piece of code I use all the time is the one that helps me scale existing vectors to the size of the work screen:

//Required to correlate the skeleton data to the PC screen
//IMPORTANT NOTE: Code for vector scaling was imported from the Coding4Fun Kinect Toolkit
//available here: http://c4fkinect.codeplex.com/
//I only used this part to avoid adding an extra reference.
let ScaleVector (length : float32, position : float32)  =
    let value = (((length / 1.0f) / 2.0f) * position) + (length / 2.0f)
    if value > length then
        length
    elif value < 0.0f then
        0.0f
    else
        value

As the reference says, I translated this piece from the Coding4Fun Kinect Toolkit. It does the job pretty well, and I just didn't want to add an extra dependency for a single call.

Scaling is not the only problem. Another issue was ellipse positioning, and here is what I came up with (F#-wise):

//This will set the ellipse positions depending on the passed instance and joint
let SetEllipsePosition (ellipse : System.Windows.Shapes.Ellipse, joint : Joint) =
    let vector = new Microsoft.Research.Kinect.Nui.Vector(X = ScaleVector(640.0f, joint.Position.X), Y=ScaleVector(480.0f, -joint.Position.Y),Z=joint.Position.Z)
    let uJoint = new Joint(ID = joint.ID, TrackingState = JointTrackingState.Tracked, Position=vector)
    System.Windows.Controls.Canvas.SetLeft(ellipse,(float uJoint.Position.X))
    System.Windows.Controls.Canvas.SetTop(ellipse,(float uJoint.Position.Y))

Since uJoint won't let me set properties to be mutable, I am setting them directly through the constructor. Same applies to the vector instance I am using.

Once the skeleton frame is ready, I am handling the received content via a standard event handler represented through a function:

//Triggered when a new skeleton frame is ready for processing
let SkeletonFrameReady (sender : obj) (args: SkeletonFrameReadyEventArgs) = 
    let skeletonSet = args.SkeletonFrame
    
    for i in skeletonSet.Skeletons do
        if i.TrackingState = SkeletonTrackingState.Tracked then
            SetEllipsePosition(hEllipse, i.Joints.Item(JointID.Head))
            SetEllipsePosition(lhEllipse, i.Joints.Item(JointID.HandLeft))
            SetEllipsePosition(rhEllipse, i.Joints.Item(JointID.HandRight))

When starting the application, make sure that you tune it up to actually be ready to handle skeletal data:

let WindowLoaded (sender : obj) (args: RoutedEventArgs) = 
    runtime.Initialize(RuntimeOptions.UseColor + RuntimeOptions.UseSkeletalTracking)
    runtime.VideoStream.Open(ImageStreamType.Video, 2, ImageResolution.Resolution640x480, ImageType.Color)
    runtime.VideoFrameReady.AddHandler(new EventHandler<ImageFrameReadyEventArgs>(VideoFrameReady))
    runtime.SkeletonFrameReady.AddHandler(new EventHandler<SkeletonFrameReadyEventArgs>(SkeletonFrameReady))

When a proper trackable skeleton is detected, I am able to visualize that (obviously it is a bit off on the image below):

I packed all three Kinect application types in a set of Visual Studio templates that you can download from the KinectContrib page.

Some great resources that helped me go through my initial F# journey are located here and here. Also, special thanks to Richard Minerich for checking some of my experimental F# code. Overall, not bad for a first try.