MAY 31, 2011 • Nintendo Co. Ltd. started a fire when it broke new ground introducing motion sensing controllers with great success in 2006. The consumer response to the Wii was so enormous that Microsoft Corp. and Sony Computer Entertainment were forced into develop their own motion input controllers. All three console makers have used distinctly different technology, each featuring varying levels of precision in how human motion is captured. But as some of the initial novelty starts to wear off with consumers, the question becomes, where is motion sensing going? What’s coming next?
SoftKinetic is a company that thinks they know where motion sensing is going. Where players mostly use broad body motions to play their video games, SoftKinetic vice president of marketing and business development Virgile Delporte sees games that recognize gestures as the next step for the technology. So DFC asked Delporte to explain where motion sensing technology stands today, and what we can expect in the near future.
VD: You may place depth sensing hardware devices in 3 categories:
● The first and oldest category is stereoscopic hardware: you combine two standard 2D images to generate a 3D image. This is a tried-and-true method, yet has some issues with proper lighting conditions to create a good quality 3D image.
● The second and most complex technology is structured light. This technology is used in Kinect. In essence, the camera lights up the scene with lasers while projecting a grid in space, then calculates the deformation of that grid when people move through it. Then a chip processes the information and generates a 3D image.
● The third and most promising technology to us is called time of flight: the camera lights the scene with either LEDs or infrared laser light, then calculates the time it takes for the light to travel to the obstacles (whether humans or objects) and to come back.
Each of these technologies has pros and cons. In theory, it’s very inexpensive to produce a stereoscopic camera (composed of two standard RGB sensors), yet it’s challenging to get accurate information. Structured light is mature and reliable, but doesn’t offer much flexibility and requires substantial processing power to analyze the image. Time of flight is globally accepted as the most promising and flexible technology. Case in point, Microsoft recently acquired two players in the field, including Canesta in November 2010. There is no consumer product based on time of flight technologies on the market yet – it’s coming pretty soon though.
DFC: What is SoftKinetic’s area of expertise in motion sensing technology? What sets your firm apart?
VD: SoftKinetic has expertise in three unique areas. Growing since 2003, our gesture recognition middleware called “iisu” (the interface is you) is the most advanced SDK toolset to efficiently use a 3D depth image, regardless of the 3D camera providing that image. We also run a fabless hardware division producing DepthSense sensors and cameras. Finally, SoftKinetic Studios is our content division, the leading provider of gesture-based interactive entertainment experiences.
No other company in the world can offer this end-to-end solution to OEMs. Yet, for those not interested in our complete package, our iisu middleware supports all 3D cameras and our DepthSense hardware works with a range of middleware. This is unique as it offers total flexibility to our customers.
DFC: Please give us a rundown on what your major products are, who you partner with, and which clients in the games industry have signed with SoftKinetic technology.
VD: Originally, our core product was our middleware (iisu), which was actually the foundation for all our initiatives. Having a strong middleware solution that works with all 3D cameras is a major strength. Today, our key strength is our ability to offer one integrated solution combining middleware, hardware and game design solutions or an à la carte menu, depending on the client’s needs. For instance, companies launching competing hardware are looking for content, and our studio has fun games they can bundle for the launch. In addition, our developer network is growing fast, and this is a huge asset for our OEM clients.
For the consumer market, we are engaged with most key consumer electronics companies around the world (set-top boxes, consoles, and smart TV providers), as well as Tier 1 and Tier 2 telecommunication suppliers. Some are about to announce their products, some are in heavy development mode and others are now pursuing gesture-based opportunities following the success of Kinect.
The games industry is changing dramatically. The blockbuster companies need to see a critical mass to engage on gesture-based game development beyond the scope of Kinect. On the other hand, smaller companies can afford to explore new territory, and are currently building high-quality projects. The top teams will make a fortune by licensing their catalogue of games to OEMs looking to release a product with bundled titles.
DFC: Why do you think motion sensing has become viable for video game use in the last five years, and not before?
VD: Cost is the key factor. With Kinect, Microsoft made a bold move by bringing the consumer a technology that was previously reserved for the professional world. Nintendo did the exact same thing with the Wii a few years prior: gyroscopes and accelerometers were used in virtual reality for years, but had never been produced at this volume.
The cost of the individual components for a 3D camera are rather low, but the challenge is the mass production of hardware, together with providing a great consumer experience, including high quality user interface and games. We believe that consumers are now ready to embrace gesture tech on a massive scale.
DFC: How is motion-sensing input not being used in video games, yet should be?
VD: While the industry grows rapidly, user experience remains key – there are only a handful of game designers with the creativity to create an intuitive and innovative user experience. One idea is to mix RGB and 3D information, to create bold new augmented reality games.
We believe that television will remain the centerpiece of the living room, with powerful set-top boxes and smart TVs providing a mix of TV & video games, social interactivity, MMOTVS (“massively multiplayer online TV shows”) and other new forms of entertainment.
DFC: What does 10 million Microsoft Kinects sold between November and March mean for the future of motion sensing?
VD: It means that the market is ready for the gesture revolution. The numbers are substantial, but until now, there has only been one vendor in a very niche market. That is all set to change.
We now anticipate the growth to come from other vendors, on smart TVs and set-top boxes. We should not forget that the technology is perfect to engage in public spaces (no device to touch, great for hygiene and bringing a new level of interaction for advertisers). Now that people are familiar with the technology, advertisers will harness gesture tech to ride the hype and interact with consumers more than a static billboard ever could.
DFC: Please describe what is different about gesture recognition compared to the motion sensing people are already familiar with via the Nintendo Wii, PlayStation Move, or Microsoft Kinect?
VD: All these technologies share the same goal, by inviting people to interact with games in a new way and engage physically. If you put a traditional controller in the hands of a child for the first time, he’s going to spend more time concentrating on his hands and trying to hit the right button, rather than watching the screen.
Now give him a Wii remote, and he’s likely to have more fun; then with Kinect the process is even simpler. At the end of the day, gaming should be for the masses as well as hardcore gamers. Until now, games have been created for gamers and those who sought out gaming consoles. With motion interaction, gaming is more accessible to more people; removing the controller creates a more natural, intuitive experience.
To sum up, Move and Wii mainly allow hand tracking. Kinect offers full body tracking, but all three require a game console.
As far as SoftKinetic is concerned, we‘ve created a living room experience that includes games, TV, entertainment, home automation and more.
DFC: How can gesture recognition become part of gameplay beyond as an interface tool?
VD: Very interesting question! Full body tracking as a gamepad replacement is just the beginning of how we can use gesture recognition, both for players and developers. The camera has the potential to influence gameplay based on how you play, as it tracks your behavior, learns your signature movements, and observes what is happening in the living room beyond just the player.
There are many ways to involve the player beyond the user interface. However, it doesn’t always have to be while standing or jumping around – we strongly believe in couch-play in addition to more active sessions.
DFC: If you look at some of the new touch screen game implementations, you can see cases of adventure games where you can trace the route for the character to climb a rock face with your finger tip. Or tell a character to grab a rope to swing over a gorge by tapping your finger once on the rope. Can gesture recognition mimic this input now, and if not, how soon before it can?
VD: Though interesting, touchscreen gameplay offers only a basic level of interaction. You can do all that in a much more engaging way thanks to gesture recognition: grabbing a rope with your hands, mimicking swinging in the air with your full body, letting go of the rope and performing a solid landing is totally unique experience!
Technically, full-hand tracking is a simple task and while it’s possible to track fingers, it’s much more complex, especially with small hands. We can mimic the touchscreen gestures you’ve described, but our methods will be different. In other words, you’re getting the same gameplay experience, while the camera is merely tracking your palm.
DFC: How far are we from having a baseball game where sensing hardware will be able to track the human wrist and fingers sufficiently well to put different kinds of spin on a baseball being pitched in-game?
VD: Not far! But for now, providing an arcade experience is best. It’s possible to have a realistic simulation, but it may involve constraints on the player, such as being very close to the screen.
Note that we strongly believe in the combination of full body tracking with a Move-like peripheral as the ultimate experience: combining the precision of a gyroscope and the immersive feeling of a full body tracking is truly awesome.
DFC: One criticism of gesture recognition options is that prolonged application promotes “Gorilla Arm” muscle strain. What do you do to address that criticism?
VD: This is a very valid point. You should blame game designers for this! More seriously, it will take time for designers to appropriate this new paradigm and change up their methods. If they plan to port an existing game, they’ll likely fail to provide an entertaining experience.
DFC: How is motion sensing hardware going to change in the next five years? What will be the feature enhancements?
VD: Gesture recognition hardware will naturally follow the typical evolutionary path of modern electronics: more, cheaper, faster. In our case, this means a higher depth map resolution, which will improve tracking resolution dramatically and offer never-before-seen experiences. More importantly however, gesture recognition will become part of our life, for TV browsing, enhanced PC navigation, home automation, video conferencing – with background suppression – and in ways we have yet to discover. The technology will also be deployed in automobiles for passenger entertainment and pedestrian safety, for example.
DFC: How fast can existing motion-sensing hardware be upgraded by firmware or software updates?
VD: This is a daily duty for our software teams, and a lot of the potential depends on high quality firmware to best utilize the hardware capabilities. Software is also critical, as solid middleware allows game and application designers to craft the very best user experiences. The more tools we provide, the faster they’ll be able to prototype: discarding the bad ideas is part of the learning process!
At SoftKinetic, we announced at GDC 2011 the free version of iisu for non-commercial use. We’re getting a lot of traction, because we provide a truly cross-platform middleware that’s compatible with all 3D camera technologies.
DFC: How much potential for buyer’s remorse is there for consumers who have purchased the current crop of motion-sensing devices when next-generation models arrive?
VD: This is a pretty open question. I think we need to look at this from a usage point of view. Does this new camera provide enough extra value to make me replace it? Is this new camera compatible with another device? For now, the only device available is the Kinect, and the potential for gesture recognition is far from topping out. As a result, new peripherals should be able to justify themselves.
DFC: Are consumers in one region of the world any more open to motion-sensing games and hardware than in another region? If so, why is that? For instance, does the wider adoption of Sony’s EyeToy in Europe make Europe an easier sell? Or does the predominance of sports games in North America make the U.S. an easier sell?
VD: We believe gesture recognition technologies are truly universal and not bound to any culture. However, the application must vary according to parameters as simple as the size of the living room, the size of the individuals, and other variables. At present, we’ve seen the greatest demand for these technologies in Asia.
DFC: How do game developers need to think differently to make the best use of motion-sensing hardware?
VD: They need to try and fail, try and fail, learning from their mistakes as quickly as possible. There aren’t a lot of games as a basis of comparison, so it can be difficult to envision the ideal gameplay. One thing’s for sure: the best games and are yet to come!
To some extent, the development process may be easier for younger game designers, as the experienced veterans may be hindered by their history with traditional input devices.
DFC: Which game developers are ahead of the pack in applying gesture recognition features?
VD: Naturally, those who started first are ahead. Some teams at Ubisoft are delivering great quality products. On a much smaller scale, our internal division SoftKinetic Studios brings solid experience to the field, with teams who began working with gesture over five years ago.
Overall, studios that are not afraid of fundamentally rethinking their game design will learn very fast. Small and large studios can achieve that goal, even with a AAA series, if they are motivated.
DFC: How big a difference does display screen size make in using gesture recognition? Are consumers more comfortable using their bodies when a 60-inch TV is used, compared to a 36-inch model? If so, why?
VD: This has to do with the quality of the immersion. The larger the screen, the greater the immersion. However, the quality of the interaction remains paramount, the more natural the better. Players focus their attention on the magic of seeing their character move in time with them, and the size of the screen doesn’t matter so much. Similarly, you may cry watching an emotional movie on an airplane, even with the ugly low-resolution ten-inch screen.
DFC: Does 3D gaming have any impact on motion-sensing adoption? How good a fit is motion input with 3D visuals?
VD: This is a great question. I don’t think this question has been properly tackled yet, though it makes sense that playing a game with a 3D screen will enhance the quality of the experience, particularly the feeling of touching nearby objects. The lack of development in the field is likely due to the relatively poor adoption of 3D screens to date. We’re looking forward to seeing what creative developers can do with 3D gaming. Our internal team has some cool new ideas that are currently in the prototyping stage.