Keeping the virtual world stable in VR
As described in a previous blog post, VR will offer unprecedented experiences and unlimited possibilities that allow us to virtually teleport ourselves to any place, both real and virtual. To make this a reality and create the feeling of presence, we must meet the extreme requirements of fully immersive VR. Some of these important requirements are new concepts with new terminology, so to better understand them let’s first develop intuition about how VR works.
How VR works
Creating a sense of physical presence is all about convincing the brain. The brain creates reality based on sensory inputs and interactions with the environment. In other words, our reality is formed by our sensory inputs, which create our perception of the world around us. Think about where you are right now and what makes you believe you are there. How about when you first wake up in the morning or from a nap – you quickly perceive that you are at home on your couch, in a hotel on a bed, etc. What makes reality feel real?
It is a combination of a few things. Our senses — what we see, hear, smell, feel, taste — are stimulated by the world around us, and our brain learns to associate these sensory inputs with a perception of reality. We don’t need all of our senses to be stimulated to create presence, but certainly, the more the better with vision being the most crucial. One sense that is crucial for VR but is not typically listed as one of the “five senses” comes from the vestibular system. The vestibular system, which is primarily located in the inner ear, provides the sense of balance and spatial orientation, helping us to move without falling and to know the orientation of our head.
As we interact with the real world, receiving the appropriate sensory responses is crucial to presence. If we turn our head to the left, then we immediately see whatever is to the left and hear the sounds of our environment a little differently. If we touch an object, we feel the appropriate texture and resistance.
To replicate this, VR needs to create sensory harmony, which means keeping sensory inputs and interactions in sync. It is the conflicts between sensory inputs that takes us out of immersion and makes us feel uncomfortable or sick. Our senses need to be stimulated at the right time with the right sensory inputs, otherwise our brain will detect something is wrong. With this simplified view of how we perceive reality, many of the challenges in VR can be traced back to the different combinations of sensory conflicts. In this blog post, I’ll focus on one sensory conflict between vision and vestibular system.
Motion to photon latency
One of the biggest challenges for VR is reducing the amount of time between an input movement (head turns) and the screen being appropriately updated (light emitted from the updated screen), which is known as “motion to photon” latency (MTP).
Research has determined that MTP on the order of 20 milliseconds (ms) or less is required for a good user experience in VR. To put this challenge in perspective, a display running at 60 Hz is updated every 17 ms, and a display running at 90 Hz is updated every 11 ms. Lag in the user interface (UI) beyond 20 ms will be apparent to the user and impact the ability to create immersion — plus it may make the user feel sick. UI latency is not a new challenge, rather it is the magnitude of improvement required.
Reducing MTP is key to stabilizing the virtual world as the user moves. There are many processing steps required before updating the display. The end-to-end path includes sampling a variety of inertial and visual sensors, sensor fusion, view generation, render / decode, image correction, and updating the display.
In addition to optimizing for speed, producing realistic visuals requires accurate motion tracking to make sure that the view the user sees in the virtual world correctly corresponds to the user’s current position and orientation in the 3D space.
Measuring motion to photon latency
Since MTP is such an important metric for the VR experience, the VR industry needs to come up with consistent measurement methodology and terminology. Determining what to measure, how to measure, and grading the visual quality are all relevant questions. Qualcomm has taken the lead with industry partners to begin the process of standardizing this metric.
It is also important to remember that MTP is device metric, thus it will be influenced by the combination of several components, such as the sensors, SOC, display, and software. For example, two devices with the same components except the display could have different MTP, since OLEDs have low persistence and update pixels as they are ready while LCDs have higher persistence and update pixels all at once at the end of a frame. A key challenge for all companies within the VR ecosystem going forward will be to collectively minimize their corresponding contribution to MTP.
In terms of what to measure, the start point is the onset of a sudden rotational movement of the head (for 3 degree of freedom MTP). The end point, is when a pixel on the display lights up, which could be first, middle, or last pixel on the screen. We recommend selecting the middle pixel (center of the screen) since that keeps some of the latency associated to the selected display and more accurately reflects the user experience. Multiple measurements should be taken to determine the average MTP, but it is also important to also take note of max MTP. From this discussion, it should become clear that there could be large discrepancy in results depending on what you measure.
In terms of how to measure, there are various methods that differ in terms of accuracy, complexity, and cost. Methods range from using a latency dongle or a universal sensor API (to stream in a defined motion pattern) to using a robotic arm, a camera tracking rig, or ground truth sensors. Whether one or multiple methods win out is to be determined.
In terms of grading the visual quality, this is also an open question since it is hard to address. The absolute error versus a golden image might not be the best technique for measuring the perceived error by the human visual system. For example, foveated rendering, prediction, and time warp may significantly change an image without affecting the visual quality to the user. However, completely skipping important parts of the visual processing or having a bad implementation must be detected. Grading the visual quality clearly needs a thorough investigation.
Minimizing motion to photon latency
At Qualcomm Technologies, we’ve spent a lot of time and effort figuring out how to reduce MTP latency. To minimize MTP, an end-to-end approach is needed that reduces the latency of individual processing tasks, that runs as many tasks in parallel as possible, and that takes the appropriate shortcuts for processing tasks. An optimized solution requires hardware a SOC with all the functionality on the same chip, software, sensors, and display all working together in harmony. Knowledge in all these areas is required to make the appropriate optimizations and design choices. Possible optimizations to reduce latency include techniques, such as high sensor sampling speeds, high frame rates, hardware streaming1, late latching2, asynchronous time warp, and single buffer rendering. Please refer to our VR white paper for more details.
Qualcomm Technologies is uniquely positioned to support superior mobile VR experiences by designing for leading MTP. By taking a system approach and custom designing specialized engines across the SoC, Qualcomm Snapdragon processors are engineered to provide an efficient heterogeneous computingsolution that is optimized from end-to-end for latency, power, and performance. For example, the Snapdragon 820 processor is designed to meet the VR processing demands within the thermal and power constraints of a wearable VR headset. Our optimized system software and Snapdragon VR SDK have been developed in harmony with Snapdragon 820. We are proud of our initial MTP results on a variety of upcoming mobile VR devices built in cooperation with our OEM partners and look forward to sharing those results.