Wednesday, March 19, 2014

Kinecting Back to Blogging

It's been about two weeks since my last blog post, and so I figure I should make this particular post about what exactly is going on.  Back in December, I applied for the Microsot Kinect for Windows Version 2 alpha.  In February, I found out that I was accepted, and went ahead and got the sensor as a birthday present to myself.  Finally, when my spring break started two weeks ago, I got the chance to sit down and fiddle with the hardware, and I've got to say, I am impressed so far.

So for today's post, I'll be going over how to use the Microsoft Kinect for Windows V2 APIs in C#, and the process of making a simple application.  The code for this will be presented at the end of the article, so if you're just interested in getting the code, just scroll down to the bottom.  There's a bit to this project since we'll be using multiple data streams, but hopefully not too bad.  Before I continue on, I do want to stress that this is preliminary software and/or hardware and APIs are preliminary and subject to change.

Setting up the Form

The design of the form.  This is all drag and drop from
the toolbox on the side.
The first thing that you'll want to do is setup your project to have a simple form and to display a PictureBox within this form.  This is all just dragging and dropping the correct elements to the screen.

Next, you'll want to add three buttons, one for previewing the color image, one for previewing the depth image, and one for previewing the infrared image.  I put these off to the side as you can see in the image to the right.  In mine, I also went ahead and added a snapshot button, though this is optional (I won't go over saving the image, so consider that extra credit)

The next thing you'll want to do is set the buttons to be handled when clicked, so just double click each button to prepare those methods for later.

Now that we've gone ahead and setup the form, it's time to move on to the fun part: Setting up the Kinect's data streams!

Setting up the Kinect Sensor

The Kinect DLL reference to add.  Again, make
sure it's version is 2.0.0.0
To start out, you'll want to add the DLL reference for the Kinect to your project.  This can be done by right clicking your project's References and selecting "Add Reference".  Next, under "Assemblies", click on "Extensions" and find "Microsoft.Kinect".  Make sure that it's version is 2.0.0.0, as this is the specific version for the new Kinect for Windows sensor.  What this does is give us access to all of the methods for using the Kinect for Windows V2 sensor in our project.

You'll also need to make sure that your project is setup to compile with x64 CPUs only.  You can do this by right clicking your project, then clicking properties.  Go to the Build tab on the left side bar and for "Platform target" select "x64".

Next, you'll want to open the code view for "Form1.cs", and at the top, add this line of code:

 using Microsoft.Kinect;  

This will tell our form to expose the Kinect SDK methods and classes that we need.  After that you'll want to add some objects to your Form1 class like so:

     /// <summary>  
     /// The Kinect sensor to obtain our data from.  
     /// </summary>  
     private KinectSensor sensor = null;  
   
     /// <summary>  
     /// A reader for getting frame data. This particular reader can be used to read from  
     /// multiple sources (in this case, color, depth, and infrared).  
     /// </summary>  
     private MultiSourceFrameReader frameReader = null;  
   
     /// <summary>  
     /// The raw pixel data recieved for the depth image from the Kinect sensor.  
     /// </summary>  
     private ushort[] rawDepthPixelData = null;  
     /// <summary>  
     /// The raw pixel data recieved for the infrared image from the Kinect sensor.  
     /// </summary>  
     private ushort[] rawIRPixelData = null;  
   
     /// <summary>  
     /// The type of image to display in the form.  
     /// </summary>  
     ImageType imageType = ImageType.Color;  
   
Starting off, we have the creation of our Kinect Sensor object and setting it to null to start with.  This will be the main object that we use to retrieve all of our data from the sensor and is pretty simple to start with.

Next up is the multi source frame reader object.  With the new Kinect, it's usually help full to retrieve multiple data streams from the sensor, and since our project specifically needs to do this, we need a multi source frame reader.  What this does is pass all of the data through a single method in our application.  This helps speed-up our processing time as well, since the program does not have to switch between multiple methods for processing data, just a single one.

Then we have our data arrays for our raw pixel data for each of our images.  When we read the data from the Kinect sensor, it will come out as just a stream of raw data, that we can then pass into our images to display them.  One thing to note is the size of the arrays: The raw color data will have each value be from 0-255, which is standard BGRA formatting (or different if you choose to change it), and so we can actually just pass that straight to our image buffer, but the depth and infrared images can be between 0-65535.  This means that displaying these images will require a bit more work since it can have such large values, but it's nothing too tough.  We'll cover that later though.

Finally, we have the image type.  This determines what type of image the application will show the user, and is using the follow enumeration, declared before the form1 class:

   /// <summary>  
   /// An enumeration to decide what type of image to show the user.  
   /// </summary>  
   public enum ImageType  
   {  
     Color = 0,  
     Depth = 1,  
     IR = 2  
   }  
   
The properties dialog for Form1 showing
the form's events.


Now that we have our objects setup, we need to initialize everything.  To do this, go back to the design view for Form1, and in the forms properties, select the "Events" view (the little lighting bolt image).  Go down to the "Load" event, and double click the blank space next to it, and it will take you to the code view again, in a new "Form1_Load" method.  Here is where we will initialize all of our objects.  Let's go ahead and start by setting up all of our stuff for the Kinect sensor:

       // Check whether there are Kinect sensors available and select the default one.  
       if(KinectSensor.KinectSensors.Count > 0)  
       {  
         this.sensor = KinectSensor.Default;  
   
         // Check that the connect was properly retrieved and is connected.  
         if(this.sensor != null)  
         {  
           if (this.sensor.Status == KinectStatus.Connected)  
           {  
             // Open the sensor for use.  
             this.sensor.Open();  
   
             // Next open the multi-source frame reader.  
             this.frameReader = this.sensor.OpenMultiSourceFrameReader(FrameSourceTypes.Color | FrameSourceTypes.Depth | FrameSourceTypes.Infrared);  
   
             // Retrieve the frame descriptions for each frame source.  
             FrameDescription depthFrameDescription = this.sensor.DepthFrameSource.FrameDescription;  
             FrameDescription irFrameDescription = this.sensor.InfraredFrameSource.FrameDescription;  
               
             // Afterwards, setup the data using the frame descriptions.
             // Depth and infrared have just one component per pixel (depth value or infrared value).  
             this.rawDepthPixelData = new ushort[depthFrameDescription.Width * depthFrameDescription.Height * 1];  
             this.rawIRPixelData = new ushort[irFrameDescription.Width * irFrameDescription.Height * 1];  
   
             // Finally, set the method for handling each multi-source frame that is captured.  
             this.frameReader.MultiSourceFrameArrived += frameReader_MultiSourceFrameArrived;  
           }  
         }  
       }  
   
Phew, that's a lot of code, let's go over it section.  To start with we need to check that there are Kinect sensors available, and then select the default sensor.  Once that is done, we need to check that it was properly set and connected.  Afterwards, we can finally open the device for usage with "this.sensor.Open();".

Next up, we need to set the frame reader object.  This is done by calling OpenMultiSourceFrameReader in the Kinect sensor object.  We pass it several values, indicating what data streams we want to use.  In this case, we only need color, depth, and infrared streams, though there are also audio, body, and long exposure infrared frames available, that I'll try to go over in a separate tutorial in the future.  When selecting these values, we simple OR them together as the parameters value.

Finally, we get the frame descriptions for each frame source.  In this case, we need the descriptions for color frames, depth frames, and infrared frames.  This will give us the size of each frame (width and height), which we then use to initialize the raw data arrays.  You'll notice that for the depth and infrared frames, we only need one value per pixel.  This is due to the fact that these streams don't return a color, but a single value indicating either how far away a pixel is, or how strong the infrared light intensity is coming from that pixel.  Thus we only need one component for each pixel.

Handling Frame Data

The last thing we do is setup a method for handling each multi-source frame.  This will grab the color, depth, and infrared data, and then draw the selected stream to the picture box image we created at the beginning of the tutorial.  This method among four parts however, as trying to do all of this in one method would result in some ugly code.  Let's start wtih the main frame arrived method:

       // Try to get the frame from its reference.  
       try  
       {  
         MultiSourceFrame frame = e.FrameReference.AcquireFrame();  
   
         if (frame != null)  
         {  
           // The frame is disposable, so make sure we state that we are using it.  
           using (frame)  
           {  
             try  
             {  
               // Then switch between the possible types of images to show, get its frame reference, then use it  
               // with the appropriate image.  
               switch (this.imageType)  
               {  
                 case ImageType.Color:  
                   ColorFrameReference colorFrameReference = frame.ColorFrameReference;  
                   useRGBAImage(colorFrameReference);  
                   break;  
                 case ImageType.Depth:  
                   DepthFrameReference depthFrameReference = frame.DepthFrameReference;  
                   useDepthImage(depthFrameReference);  
                   break;  
                 case ImageType.IR:  
                   InfraredFrameReference irFrameReference = frame.InfraredFrameReference;  
                   useIRImage(irFrameReference);  
                   break;  
               }  
             }  
             catch (Exception)  
             {  
               // Don't worry about exceptions for this demonstration.  
             }  
           }  
         }  
       }  
       catch (Exception)  
       {  
         // Don't worry about exceptions for this demonstration.  
       }  
   
Starting out, we get the frame in a try-catch clause.  We don't do any error checking for exceptions, though you could add this for dropped frames and such.  Next, we say that we are using the frame since it's disposable.  This will clean up after we are done with this section.  Finally, depending on the type of image we want to show, we acquire a reference for that image, and use the appropriate method for it.  The reason we pass the frame reference to the method instead of the actual frame is to saving processing time.  Passing the whole frame would duplicate it, causing us to waste more time as it creates two instances of the image.

Now for the methods that actually handle the frames.  First up, we have the useRGBAImage method:

     /// <summary>  
     /// Draws color image data from the specified frame.  
     /// </summary>  
     /// <param name="frameReference">The reference to the color frame that should be used.</param>  
     private void useRGBAImage(ColorFrameReference frameReference)  
     {  
       // Actually aquire the frame here and check that it was properly aquired, and use it again since it too is disposable.  
       ColorFrame frame = frameReference.AcquireFrame();  
   
       if (frame != null)  
       {  
         using (frame)  
         {  
           // Next get the frame's description and create an output bitmap image.  
           FrameDescription description = frame.FrameDescription;  
           Bitmap outputImage = new Bitmap(description.Width, description.Height, PixelFormat.Format32bppArgb);  
   
           // Next, we create the raw data pointer for the bitmap, as well as the size of the image's data.  
           System.Drawing.Imaging.BitmapData imageData = outputImage.LockBits(new Rectangle(0, 0, outputImage.Width, outputImage.Height),  
             ImageLockMode.WriteOnly, outputImage.PixelFormat);  
           IntPtr imageDataPtr = imageData.Scan0;  
           int size = imageData.Stride * outputImage.Height;  
   
           // After this, we copy the image data directly to the buffer. Note that while this is in BGRA format, it will be flipped due  
           // to the endianness of the data.  
           if (frame.RawColorImageFormat == ColorImageFormat.Bgra)  
           {  
             frame.CopyRawFrameDataToBuffer((uint)size, imageDataPtr);  
           }  
           else  
           {  
             frame.CopyConvertedFrameDataToBuffer((uint)size, imageDataPtr, ColorImageFormat.Bgra);  
           }  
   
           // Finally, unlock the output image's raw data again and create a new bitmap for the preview picture box.  
           outputImage.UnlockBits(imageData);  
           this.previewPictureBox.Image = outputImage;  
         }  
       }  
     }  
   
The fist thing we do is acquire the frame and make sure that it was retrieved properly.  We then use the frame so it will be disposed later, and get its frame description too.  Next we get the image's data, a pointer to it, and the size of the data.  We then copy the frame to this data pointer.  Note that we want to retrieve the image color in BGRA format, but the image is expecting ARGB data.  This is actually okay because the endianness of the data will correct for this.  Finally, we unlock the data and set the preview picture box image to be the output image.  Note that we have set the picture box to stretch the image to fit, so we don't have to resize the image here.

Next up, we have the depth image frame's usage method.

     /// <summary>  
     /// Draws depth image data from the specified frame.  
     /// </summary>  
     /// <param name="frameReference">The reference to the depth frame that should be used.</param>  
     private void useDepthImage(DepthFrameReference frameReference)  
     {  
       // Actually aquire the frame here and check that it was properly aquired, and use it again since it too is disposable.  
       DepthFrame frame = frameReference.AcquireFrame();  
   
       if (frame != null)  
       {  
         FrameDescription description = null;  
         using (frame)  
         {  
           // Next get the frame's description and create an output bitmap image.  
           description = frame.FrameDescription;  
           Bitmap outputImage = new Bitmap(description.Width, description.Height, PixelFormat.Format32bppArgb);  
   
           // Next, we create the raw data pointer for the bitmap, as well as the size of the image's data.  
           System.Drawing.Imaging.BitmapData imageData = outputImage.LockBits(new Rectangle(0, 0, outputImage.Width, outputImage.Height),  
             ImageLockMode.WriteOnly, outputImage.PixelFormat);  
           IntPtr imageDataPtr = imageData.Scan0;  
           int size = imageData.Stride * outputImage.Height;  
   
           // After this, we copy the image data into its array. We then go through each pixel and shift the data down for the  
           // RGB values, as their normal values are too large.  
           frame.CopyFrameDataToArray(this.rawDepthPixelData);  
           byte[] rawData = new byte[description.Width * description.Height * 4];  
           int i = 0;  
           foreach (ushort point in this.rawDepthPixelData)  
           {  
             rawData[i++] = (byte)(point >> 6);  
             rawData[i++] = (byte)(point >> 4);  
             rawData[i++] = (byte)(point >> 2);  
             rawData[i++] = 255;  
           }  
           // Next, the new raw data is copied to the bitmap's data pointer, and the image is unlocked using its data.  
           System.Runtime.InteropServices.Marshal.Copy(rawData, 0, imageDataPtr, size);  
           outputImage.UnlockBits(imageData);  
   
           // Finally, the image is set for the preview picture box.  
           this.previewPictureBox.Image = outputImage;  
         }  
       }  
     }  
   
This method is almost exactly the same as before, except for one major change: We are now copying the data to an output array, which is then copied to the bitmap image.  This is because the data is not ready to be displayed as a color normally, since the values for each pixel are too large.  Remember, the represent how far away a pixel is, not its color, and so we need to account for this when rendering the image.

The infrared method is almost exactly the same as you can see:

     /// <summary>  
     /// Draws infrared image data from the specified frame.  
     /// </summary>  
     /// <param name="frameReference">The reference to the infrared frame that should be used.</param>  
     private void useIRImage(InfraredFrameReference frameReference)  
     {  
       // Actually aquire the frame here and check that it was properly aquired, and use it again since it too is disposable.  
       InfraredFrame frame = frameReference.AcquireFrame();  
   
       if (frame != null)  
       {  
         FrameDescription description = null;  
         using (frame)  
         {  
           // Next get the frame's description and create an output bitmap image.  
           description = frame.FrameDescription;  
           Bitmap outputImage = new Bitmap(description.Width, description.Height, PixelFormat.Format32bppArgb);  
   
           // Next, we create the raw data pointer for the bitmap, as well as the size of the image's data.  
           System.Drawing.Imaging.BitmapData imageData = outputImage.LockBits(new Rectangle(0, 0, outputImage.Width, outputImage.Height),  
             ImageLockMode.WriteOnly, outputImage.PixelFormat);  
           IntPtr imageDataPtr = imageData.Scan0;  
           int size = imageData.Stride * outputImage.Height;  
   
           // After this, we copy the image data into its array. We then go through each pixel and shift the data down for the  
           // RGB values, and set each one to the same value, resulting in a grayscale image, as their normal values are too large.  
           frame.CopyFrameDataToArray(this.rawIRPixelData);  
           byte[] rawData = new byte[description.Width * description.Height * 4];  
           int i = 0;  
           foreach (ushort point in this.rawIRPixelData)  
           {  
             byte value = (byte)(128 - (point >> 8));  
             rawData[i++] = value;  
             rawData[i++] = value;  
             rawData[i++] = value;  
             rawData[i++] = 255;  
           }  
           // Next, the new raw data is copied to the bitmap's data pointer, and the image is unlocked using its data.  
           System.Runtime.InteropServices.Marshal.Copy(rawData, 0, imageDataPtr, size);  
           outputImage.UnlockBits(imageData);  
   
           // Finally, the image is set for the preview picture box.  
           this.previewPictureBox.Image = outputImage;  
         }  
       }  
     }  
   
The notable difference is that the values are now grayscale to differentiate them from the depth image data, and are also made slightly darker by subtracting them from 128.  This results in a clearer image in my opinion, though you can of course change this for accuracy purposes.

Finally, we need to handle the cleanup process.  Head over to your forms design view and find the "FormClosing" event and once again double click it.  This will add a new method in the code view, which you can then add this code to:

       if (this.frameReader != null)  
       {  
         this.frameReader.Dispose();  
       }  
       if (this.sensor != null)  
       {  
         this.sensor.Dispose();  
       }  
   
All this does is check whether there is a frame reader and a sensor and dispose of both if they exist.  The final thing you'll want to do is go back and set the imageType variable depending on which button is clicked using the methods we set up within the form at the beginning, and presto, you now have a full demo of the new Kinect for Windows V2 sensor!

Conclusion

While this may seem like a lot of code, remember that you don't need all of these streams, and can get away with using just one instead.  If you have any questions about this code, feel free to leave a comment and I'll try to answer your question if I can!  This hardware is incredible to look at and has some nifty features, so I'll try to post more tutorials as time goes on.  You can find this code on my Github page here as well.