Joshua Blake on Kinect for Windows and the Natural User Interface Revolution (Part 3)
The following blog post was guest authored by Kinect for Windows (K4W) MVP, Joshua Blake. Josh is the Technical Director of the InfoStrat Advanced Technology Group in Washington, D.C. where he and his team work on cutting-edge Kinect and NUI projects for their clients. You can find him on twitter @joshblakeor at his blog, https://nui.joshland.org .
Josh recently recorded several videos for our Kinect for Windows Developer Center. This is the third of three posts he will be contributing this month to the blog.
In part 1, I shared videos covering the core natural user interface concepts and a sample application that I use to control presentations called Kinect PowerPoint Control. In part 2, I shared two more advanced sample applications: Kinect Weather Map and Face Fusion. In this post, I’m going to share videos that show some of the real-life applications that my team and I created for one of our clients. I’ll also provide some additional detail about how and why we created a custom object tracking interaction. These applications put my NUI concepts into action and show what is possible with Kinect for Windows.
Making it fun to learn
Our client, Kaplan Early Learning Company, sells teaching resources focused on early childhood education. Kaplan approached us with an interest in creating a series of educational applications for preschool and kindergarten-aged children designed to teach one of several core skills such as basic patterns, spelling simple words, shapes, and spatial relationships. While talking to Kaplan, we learned they had a goal of improving student engagement and excitement while making core skills fun to learn.
We suggested using Kinect for Windows because it would allow the students to not just interact with the activity but also be immersed in virtual worlds and use their bodies and physical objects for interacting. Kaplan loved the idea and we began creating the applications. After a few iterations of design and development, testing with real students, and feedback, we shipped the final builds of four applications to Kaplan earlier this summer. Kaplan is now selling these applications bundled with a Kinect for Windows sensor in their catalog as Kaplan Move-NG.
The Kinect for Windows team and I created the videos embedded below to discuss our approach to addressing challenges involved in designing these applications and to demonstrate the core parts of three of the Move-NG applications.
Designing early childhood education apps for Kaplan
In the video below, I discuss InfoStrat’s guiding principles to creating great applications for Kinect as well as some of the specific challenges we faced creating applications that are fun and exciting for young children while being educational and fitting in a classroom environment. In the next section below the video, read on for additional discussion and three more videos showing the actual applications.
Real-world K4W apps: Designing early childhood education apps for Kaplan (7:32)
One of the key points covered in this video is that when designing a NUI application, we have to consider the context in which the application will be used. In the education space, especially in early childhood education, this context often includes both teachers and students, so we have to design the applications with both types of users in mind. Here are a few of the questions we thought about while designing these apps for Kaplan:
- When will the teacher use the app and when will the students use the app?
- Will the teacher be more comfortable using the mouse or the Kinect for specific tasks? Which input device is most appropriate for each task?
- Will non-technical teachers understand how to set up the space and use the application? Does there need to be a special setup screen to help the teacher configure the classroom space?
- How will the teachers and students interact while the application is running?
- How long would it take to give every student a turn in a typical size classroom?
- What is the social context in the classroom, and what unwritten social behavior rules can we take into account to simplify the application design?
- Will the user interaction work with both adults and the youngest children?
- Will the user interaction work across the various ways children respond to visual cues and voice prompts?
- Is the application fun?
- Do students across the entire target age group understand what to do with minimal or no additional prompts from the teacher?
And most importantly:
- Does the design satisfy the educational goals set for the application?
As you can imagine, finding a solution to all of these questions was quite a challenge. We took an iterative approach and tested with real children in the target age range as often as possible. Fortunately, my three daughters are in the target age range so I could do quick tests at home almost daily and get feedback. We also sent early builds to Kaplan to get a broader range of feedback from their educators and additional children.
In several cases, we created a prototype of a design or interaction that worked well for ourselves as adults, but failed completely when tested with children. Sometimes the problem was the data from the children’s smaller bodies had more noise. Other times the problem was that the children just didn’t understand what they were supposed to do, even with prompting, guidance, or demonstration. It was particularly challenging when a concept worked with older kindergarten kids but was too complex for the youngest of the preschooler age range. In those cases there was a cognitive development milestone in the age range that the design relied upon and we simply had to find another solution. I will share an example of this near the end of this post.
Kaplan Move-NG application and behind-the scenes videos
The next three videos each cover one of the Kaplan Move-NG applications. The videos introduce the educational goal of the app and show a demonstration of the core interaction. In addition, I discuss the design challenges mentioned above as well as implementation details such as what parts of the Kinect for Windows SDK we used, how we created a particular interaction, or how feedback from student testing affected the application design. These videos should give you a quick overview of the apps as well as a behind-the-scenes view into what went into the designs. I hope sharing our experience will help you create better applications which incorporate the interactivity and fun of Kinect.
Real-world K4W apps: Kaplan Move-NG Patterns (6:28)
Real-world K4W apps: Kaplan Move-NG Where Am I (5:57)
Real-world K4W apps: Kaplan Move-NG Word Pop (7:41)
Object tracking as a natural interaction
The last video above showed Word Pop, which has the unique feature of letting the user spell words by catching letters with a physical basket (or box). In the video, I showed how we created a custom basket tracker by transforming the Kinect depth data. (My technique was inspired by Kyle McDonald’s work at the Art && Code 2011 conference, as shown at 1:43 in his festival demonstration.) Figure 1 shows the basket tracker developer UI as shown in the Word Pop video. In this section, I’m going to give a little more detail on how this basket tracker works and what led to this design.
Figure 1: The basket tracker developer UI used internally during development of Word Pop. The left image in the interface shows the background removed user and basket, with a rectangle drawn around the basket. The right image shows a visualization of how the application is transforming the depth data.
To find the basket, we excluded the background and user’s torso from the depth image and then applied the Sobel operator. This produces a gradient value representing the curvature at each point. We mark pixels with low curvature as flat pixels, shown in white in figure 1. The curvature threshold value for determining flat pixels was found empirically.
The outline of the basket is determined by using histograms of flat pixels across the horizontal and vertical dimensions, shown along the top and left edges of the right image in figure 1. The largest continuous area of flat pixels in each dimension is assumed to be the basket. The basket area is expanded slightly, smoothed across frames, and then the application hit tests this area against the letters falling from the sky to determine when the student has caught a letter.
In testing, we found this implementation to be robust even when the user moves the basket around quickly or holds it out at the end of one arm. In particular, we did not need to depend upon skeleton tracking, which was often interrupted by the basket itself.
One of our early Word Pop prototypes used hand-based interaction with skeleton tracking, but this was challenging for the youngest children in the target age range to use or understand. For example, given a prompt of “touch the letter M”, my three-year-old would always run to the computer screen to touch the “M” physically rather than moving her mirror image avatar to touch it. On the other hand, my seven-year-old used the avatar without a problem, illustrating the cognitive development milestone challenge I mentioned earlier. When we added the basket, skeleton tracking data became worse, but we could easily track the interactions of even the youngest children. Since “catching” with the basket has only one physical interpretation – using the avatar image – the younger kids started interacting without trouble.
The basket in Word Pop was a very simple and natural interaction that the children immediately understood. This may seem like a basic point, but it is a perfect example of what makes Kinect unique and important: Kinect lets the computer see and understand our real world, instead of us having to learn and understand the computer. In this case, the Kinect let the children reuse a skill they already had – catching things in baskets – and focus on the fun and educational aspects of the application, rather than being distracted by learning a complex interface.
I hope you enjoyed a look behind-the-scenes of our design process and seeing how we approached the challenge of designing fun and educational Kinect applications for young children. Thanks to Ben Lower for giving me the opportunity to record the videos in this post and the previous installments. Please feel free to comment or contact me if you have any questions or feedback on anything in this series. (Don’t forget to check out part 1 and part 2 if you haven’t seen those posts and videos already.)
Thanks for reading (and watching)!
-Josh
@joshblake | joshb@infostrat.com | mobile +1 (703) 946-7176 | https://nui.joshland.org