With new technology comes new fundamental interface paradigms. These sorts of paradigms don't change often and are intrinsically tied to hardware capabilities. As technology evolves, we tend to settle on a core interface that defines how developers build the products we come to love. In the early days of computing, we used printers and punchcards. Then, the mother of all demos showed us the future: the display, mouse and keyboard. These are such powerful concepts that almost 50 years later we are still using them. More recently, touchscreens, and in particular the capacitive touchscreen, have driven the smartphone revolution. Wayne Westerman and John Elias brought these to the world and paved the way for companies like Apple and Google to build the modern smartphone.
We are currently in a rare moment where we can see an approaching shift in technology. Outside of videogames, where Nintendo does something wacky every 5 years or so, it is hard to think of another time in recent memory when we saw this shift coming and had a time to ruminate on it before it emerged.
Mixed Reality is one of these shifts. In this post, I want to wildly speculate on what the core interface paradigm for this technology will be. How will we select information we care about? How will we input text? How do we navigate between different sections of the UI? Of course, I am going to focus on Magic Leap but I want to start somewhere else. Hololens.
Wait a minute. This isn't an approaching shift in technology. It is already here. You can buy a hololens dev kit for $3000 and start playing with it today. We have a fully working and thought out interface for an MR product right now.
Hololens isn't yet a consumer product. It is a work in progress. Hands on articles, videos and, seemingly, even Microsoft have all agreed that the technology is amazing but the product isn't quite ready for consumer release. That said the interface seems to be fairly well thought out and I think Magic Leap will operate in a similar way.
HoloLens uses the position and orientation of a user's head, not their eyes, to determine their gaze vector
Hololens has 3 forms of interaction: gaze, gesture and voice. To broadly put these in a context of how we use technology today we can think of gaze as similar to a mouse, gestures as similar to mouse buttons and voice behaves like the keyboard.
I worry about these interaction paradigms. We all know how annoying voice interaction is to use. We've all tried Siri and Google Now and then decided not to use them except if very rare situations. If the Wii/Kinect/Playstation Move proved anything it is that gesture based input tends to be terrible. The reason we get annoyed using voice input and gestures is an issue of feedback. Voice input is slow. You don't know if the system is correctly picking up the words you are saying until you are done saying them. Even if it works 90% of the time, the 10% that it doesn't is so frustrating that you give up on it entirely. Gestures have the same problem. It can be hard to tell if the system has actually registered your gesture or if it is just slow to load in this instance. A mouse and keyboard would suffer the same issues if they were as unreliable but they tend to be the most responsive part of a computer. You know your computer is really struggling when you can type faster than the characters are being displayed. Gaze, on the other hand, is less proven and perhaps has the most potential.
So what else can we do? Honestly, I have no idea. Gaze, Gesture and Voice are the most obvious way to approach this problem and I expect Magic Leap to utilize them. Perhaps the only thing they can do is try to make these inputs as reliable as a mouse or keyboard. We see hints of this already. Magic Leap needs to do frame by frame, highly accurate eye tracking to achieve a light field display. This can be used for the gaze portion of the UI. I imagine you will simply have to look at something to select it in some way. We can already do this today. There is a company today called Tobii that has built eye tracking for your laptop.
This shows what is available to consumers today but in quite a different form factor. Given Magic Leap will be far closer to the eye I would speculate they might be able to do a better job than Tobii. If it is highly accurate it may have the same reliability as the mouse and alleviate any concerns for the gaze section of the interface.
Of course you don't want to select everything you look at so gaze will have to be combined with a gesture likely similar to Hololens "airtap". This seems to be confirmed in the patent images which list a number of gesture interactions.
Gesture and voice input are problems that many companies have put huge resources in already. I don't imagine Magic Leap will improve drastically on the status quo. In fact, there is a decent chance they will be worse at solving these problems in the first iteration than other companies. I think this will be okay, as long as they can nail at least one gesture. They need the airtap gesture to work. I imagine this is the primary way we will interact with objects around us. If this is unreliable then the system will be frustrating to use regardless of anything else.
The patent image above shows another important interaction and that is the home button. It is easy to get lost in a UI. Things like the start button on Windows or the home button on smartphones are vital to be able to find your way back somewhere familiar.
Magic Leap's patent documentation talks extensively about totems.
"[Totems are] Physical objects which are manipulable by the user to allow input or interaction with the AR system. These physical objects are referred to herein as totems. Some totems may take the form of inanimate objects, for example a piece of metal or plastic, a wall, a surface of table. Alternatively, some totems may take the form of animate objects, for example a hand of the user."
From this description a totems can be just about anything at all that is recognized by Magic Leap. Many totems described are blank slates. Simple objects that are given life by projecting holograms on them. One such example is a blank rectangle on which a keyboard is projected. Another is a blank mouse. Since the dimensions and properties are known quantities it may allow for simpler recognition of gestures pertaining to the totem.
The point of totems seems to be the ability to give physical presence to holograms. They are described as empty shells, useless shapes, that are only useful when the context of a hologram is projected on to them. One example is particularly interesting. The patent document outlines how a blank piece of aluminum becomes a smartphone.
"[T]he AR system may render the user interface of an Android phone onto a surface of an aluminum sheet. The AR system may detect interaction with the rendered virtual user interface, for instance via a front facing camera, and implement functions based on the detected interactions."
So, you heard it here first folks. Your next smartphone might be a slab of aluminum.
Magic Leap is not one to go down the path most travelled. So what are they doing that might be crazy, out there and line up with some of the "GPU of the brain" type talk we keep hearing from them? There is one line in the patent that implies an entirely different interface to what has been described thus far. They are building a device that sits on your head. It might be the first time a company can reasonable build an EEG into their product. From the patent:
"[T]he system may measure neurological signals and use that as an input for the system. The system may have a sensor that tracks brain signals and map it against a table of commands. In other words, the user input is simply the user's thoughts, that may be measured by the user's brain signals. This may also be referred to as subvocalization sensing. Such a system may also include apparatus for sensing EEG data to translate the user's "thoughts" into brain signals that may be decipherable by the system."
It is positioned as almost an afterthought in the patent so I don't think they are actually trying to implement "subvocalization sensing". Well, at least not in the first iteration.