Friday, February 13, 2015

A look a the Leap Motion: Seeing your hands in VR

In many VR demos you are just a floating head in space. For me, this breaks the immersion as it makes me feel like I am not really part of the virtual world. Demos that include a body feel more immersive, but they are also a bit frustrating. I want my avatar’s hands to move when my hands do. To experiment with getting my hands into the scene, I got a Leap Motion controller.

When using the Leap with the Rift, you need to mount it on the Rift itself using a small plastic bracket. You can purchase the bracket from Leap but they also make the model available  on Thingiverse so you can print one out yourself should you have a 3D printer. (I do and I thought that was very cool. I really felt like I was living in the future printing out a part for my VR system.)

Once I got the mount printed out and attached to my Rift and completed the Leap setup instructions, I gave some of the VR demos available a try. Seeing hands in the scene really made it feel a lot more immersive, but what really upped the immersion was seeing hands that looked almost like mine. The leap development package includes a nice variety of hand models (by their naming conventions, I’m a light salt) and that variety is greatly appreciated.

When running the demos, the biggest problems I had with the Leap were false positive hands (extra hands) in the scene, having my hands disappear rather suddenly, and poor tracking of my fingers. Two things that helped were making sure  the Rift cables were  not in front of the Leap controller and removing or covering reflective surfaces in my office (particularly the arm rest on my chair). Even with those changes, having the perfect office setup for the Leap is still a work in progress.

I’ve downloaded the Unity core assets and I’ll be talking more about developing for the Leap using Unity in future posts. Here’s a preview of what I am working on:

Wednesday, February 4, 2015

Unity 4.6: Silent conversation - Detecting head gestures for yes and no

One of the demos that I have really enjoyed is the “Trial of the Rift Drifter” by Aldin Dynamics. In this demo you answer questions by shaking your head for yes and no. This is a great use of the head tracker data beyond changing the user’s point of view. And it is a mechanic that I would like to add to my own applications as it really adds to the immersive feel.

As an example, I updated the thought bubbles scene I created earlier to allow a silent conversation with one of the people in the scene and this blog post will cover exactly what I did.



In my scene, I used a world-space canvas to create the thought bubble. This canvas contains a canvas group (ThoughtBubble) which contains an image UI object and a text UI object.

Hierarchy of the world space canvas  
I wanted the text in this canvas to change in response to the user shaking their head yes or no. I looked at a couple of different ways of detecting nods and head shakes, but ultimately went with a solution based on this project by Katsuomi Kobayashi.

To use the gesture recognition solution from this project in my own project, I first added the two Rift Gesture files (RiftGesture.cs and MyMath.cs) to my project and then attached the RiftGesture.cs script to the ThoughtBubble.

When you look at RiftGesture.cs, there are two things to take note of. First, you’ll see that to get the head orientation data, it uses:

OVRPose pose = OVRManager.display.GetHeadPose();
Quaternion q = pose.orientation;


This gets the head pose data from the Rift independent of any other input. When I first looked at adding head gestures, I tried using the transform from one of the cameras on the logic that the camera transform follows the head pose. Using the camera transform turned out to be problematic because the transform can also be affected by input from devices other than the head set (keyboard, mouse, gamepad) resulting in detecting a headshake when the user rotated the avatar using the mouse rather than shaking their head. By using OVRManager.display.GetHeadPose(), it ensures you are only evaluating data from the headset itself.

Second, you will also notice that it uses SendMessage in DetectNod() when a nod has been detected:

SendMessage("TriggerYes", SendMessageOptions.DontRequireReceiver);

and in DetectHeadshake() when a headshake has been detected:

SendMessage("TriggerNo", SendMessageOptions.DontRequireReceiver);

The next step I took was to create a new script (conversation.cs) to handle the conversation. This script contains a bit of setup to get and update the text in the canvas and to make sure that the dialog is visible to the user before it changes. (The canvas groups visibility is set by canvas groups alpha property.) However, most importantly, this script contains the TriggerYes() and TriggerNo() functions that receive the messages sent from the RiftGesture.cs. These functions simply update the text when a nod or headshake message has been received. I attached the conversation.cs script to the ThoughtBubble object and dragged the text object from the canvas to the questionholder so that the script would know which text to update.

Scripts attached to the ThoughtBubble canvas group

At this point I was able to build and test my scene and have a quick telepathic conversation with one of the characters.


Friday, January 9, 2015

Unity 4.6: Creating a look-based GUI for VR

In a previous post, I talked about creating GUIs for VR using world space canvases. In that example,   the GUI only displayed text - it didn't have any input components (buttons, sliders, etc). I wanted to add a button above each thought bubble the user could click to hear the text read aloud.  



As I had used a look-based interaction to toggle the visibility of the GUI, this brought up the obvious question of how do I use a similar interaction for GUI input?  And, importantly,  how do I do it in a way that takes advantage of Unity's GUI EventSystem?

Turns out, what's needed is a custom input module that detects where the user is looking. There is an excellent tutorial posted on the Oculus forums  by css that is a great place to start. That tutorial includes the code for a sample input module and walks you through the process of setting up the GUI event camera.  (You need to assign an event camera to each canvas and one twist is that the OVR cameras don’t seem to work with the GUI.) By following that tutorial, I was able to get look-based input working very quickly.

Note that while look-based interactions are immersive and fairly intuitive to use, it is worth keeping in mind that look-based input won’t work in all situations. For example, if you have attached the GUI to CenterEyeCamera  to ensure that the user always sees the GUI, the GUI will follow the user’s view meaning the user won’t be able to look at any one specific option.

Friday, December 12, 2014

Unity 4.6: Thought bubbles in a Rift scene using world space canvases

I’m really liking the new GUI system for 4.6. I had been wanting to play a bit with a comic-book style VR environment and with world space canvases,  and now is the time.


 

Here's a quick rundown of how I created the character thought bubbles in this scene using world space canvases.

Creating world space canvases

Canvases are the root object for all Unity GUI elements. By default they render to screen space but you also have the option of rendering the canvas in world space, which is exactly what you need for the Rift. To create a canvas, from the Hierarchy menu, select Create > UI > Canvas. When you create a canvas, both a Canvas object and an Event System object are added to your project. All UI elements need to be added as children of a Canvas. Each thought bubble consist of world-space Canvas, and two UI elements - an image and a text box. For organization, I put the UI elements in an empty gameObject called ThoughtBubble.





Note. Hierarchy order is important as UI objects are rendered in the order that they appear in the hierarchy.

To have the canvas render as part of the 3d scene, in the Inspector for the Canvas, set the Render Mode to World Space.




When you change the render mode to world space, you’ll note that the Rect Transform for the canvas becomes editable. Screen space canvases default to the size of the screen, however, for world space canvases you need to set the size manually to something appropriate to the scene.

Setting canvas position, size, and resolution

By default the canvas is huge. If you look in the Inspector, you'll see that it has Width and Height properties as well as Scale properties.  The height and width properties are used to control the resolution of the GUI.  (In this scene the Width and Height are set to 400 x 400. The thought bubble image is a 200 X 200 px image and the font used for the Text is 24pt Ariel.)  To change the size of the canvas you need to set the Scale properties. 



To give you an idea of the proportions, the characters in the scene are all just under 2 units high. and the scale of each canvas is set to 0.005 in all directions.  With the canvas a reasonable size, I positioned each canvas just above the character.

Rotating the canvas with the player's view

For the thought bubble to be read from any direction, I attached a script to the Canvas to set the canvas transform to look at the player .

using UnityEngine;
using System.Collections;

public class lookatplayer : MonoBehaviour {
    public Transform target;
    void Update() {
        transform.LookAt(target);
    }
}


Toggling canvas visibility

When you look at a character the thought bubble appears. The thought bubble remains visible until the you look at another character. There were two ways I looked at for toggling the menu visibility - setting the active state of the UI container gameObject (ThoughtBubble) or adding a Canvas Group component to the UI container gameObject and setting the Canvas Group's alpha property. Changing the alpha property seemed easier as I would not need to keep track of inactive gameObjects, so I went with that method.   There is a canvas attached to each character in the scene. The script below is attached to the CenterEyeObject (part of the OVRCameraRig prefab in the Oculus Integration package v. 0.4.4). It uses ray casting to detect which person the user is looking at and then changes the alpha value of the character's attached GUI canvas to toggle the canvas visibility.

using UnityEngine;
using System.Collections;

public class lookatthoughts : MonoBehaviour {
    
    private  GameObject displayedObject = null;
    private  GameObject lookedatObject  = null;


    // Use raycasting to see if a person is being looked 
    // at and if sodisplay the person's attached gui canvas
    void Update () {
        Ray ray = new Ray(transform.positiontransform.forward);
        RaycastHit hit;

        if(Physics.Raycast(rayout hit100)) {
            if (hit.collider.gameObject.tag == "person"){
                lookedatObject = hit.collider.gameObject;
                if (displayedObject == null){
                    displayedObject = lookedatObject;
                    changeMenuDisplay(displayedObject1);
                }else if (displayedObject == lookedatObject){
                    //do nothing
                }else{
                    changeMenuDisplay(displayedObject0);
                    displayedObject = lookedatObject;
                    changeMenuDisplay(displayedObject1);
                }
            }
        } 
    }

    // Toggle the menu display by setting the alpha value 
    // of the canvas group
    void changeMenuDisplay(GameObject menufloat alphavalue){

        Transform tempcanvas = FindTransform(menu.transform"ThoughtBubble");

        if (tempcanvas != null){
            CanvasGroup[] cg;
            cg = tempcanvas.gameObject.GetComponents<CanvasGroup>();
            if (cg != null){
                foreach (CanvasGroup cgs in cg) {
                    cgs.alpha = alphavalue;
                }
            }
        }
    }
    

    // Find a child transform by name
    public static Transform FindTransform(Transform parentstring name)
    {
        if (parent.name.Equals(name)) return parent;
        foreach (Transform child in parent)
        {
            Transform result = FindTransform(childname);
            if (result != nullreturn result;
        }
        return null;
    }
    
}

Wednesday, November 12, 2014

Unity 4: Knowing which user profile is in use

Previous versions of the Unity Integration package did not include a call for getting the user profile name. As of 0.4.3, it is now possible get the the user profile name. To know which profile is being used, you can use GetString()found in the OVRManager.cs script.

public string GetString(string propertyName, string defaultVal = null)

Below is a simple example script (report.cs) that uses this method to print out the name of the current user profile to the console. To use this script,  attach it to an empty game object in a scene that is using the OVRCameraRig or OVRPlayerController prefab. With the Rift connected and powered on, run the scene in the Unity Editor. If default is returned, no user profile has been found.


using UnityEngine;
using System.Collections;
using Ovr;

public class report : MonoBehaviour {
    void Start () {
     Debug.Log (OVRManager.capiHmd.GetString(Hmd.OVR_KEY_USER, "")) 
    }
}


The GetString()method found in the OVRManager.cs script method is used to get the profile values for the current HMD. The OVRManager.cs script gets a reference to the current HMD, capiHmd. The Hmd class, defined in OvrCapi.cs, provides a number of constants that you can use to get user profile information for the current HMD. In this example, I used OVR_KEY_USER to get the profile name. You could also get the user’s height (OVR_KEY_PLAYER_HEIGHT), IPD (OVR_KEY_IPD) or gender (OVR_KEY_GENDER), for example.

Thursday, November 6, 2014

Thoughts on an alternative approach to distortion correction in the OpenGL pipeline

Despite some of the bad press it's gotten lately, I quite like OpenGL.  However, it has some serious limitations when dealing with the kind of distortion required for VR.

The problem

VR distortion is required because of the lenses in Ouclus Rift style VR headsets.  Put (very) simply, the lenses provide a wide field of view even though the screen isn't actually that large, and make it possible to focus on the screen even though it's very close to your eyes.

However, the lenses introduce curvature into the images seen through them.  If you render a cube in OpenGL that takes up 40° of your field of view, and look at it through the lenses of the Rift, you'll see curvature in the sides, even though they should be straight.

In order to correct for this, the current approach to correction is to render images to textures, and then apply distortion to the textures.  Think of it as painting a scene on a canvas of latex and then stretching the latex onto a curved surface.  The curvature of the surface is the exact inverse of the curvature introduced by the lenses, so when you look at the result through the lens, it no longer appears distorted.

However, this approach is extremely wasteful.  The required distortion magnifies the center of the image, while shrinking the outer edges.  In order to avoid loss of detail at the center, the source texture you're distorting has to have enough pixels so that at the area of maximum magnification, there is a 1:1 ratio of texture pixels to screen pixels.  But towards the edges, you're shrinking the image, so all your extra rendered pixels are essentially going to waste.  A visual representation of this effect can be seen in my video on dynamic framebuffer scaling below, at about 1:12.




A possible solution...

So how do we render a scene with distortion but without the cost of all those extra pixels that never make it to the screen?  What if we could modify the OpenGL pipeline so that it renders only the pixels actually required?

The modern OpenGL pipeline is extremely configurable, allowing clients to write software for performing most parts of it.  However, one critical piece of the pipeline remains fixed: the rasterizer.  During rendering, the rasterizer is responsible for taking groups of normalized devices coordinates (where the screen is represented as a square with X and Y axes going from -1 to 1) representing a triangle and converting them to lists of pixels which need to be rendered by the fragment shaders.  This is still a fixed function because it's the equivalent of picking 3 points on a piece of graph paper and deciding which boxes are inside the triangle.  It's super easy to implement in hardware, and prior to now there hasn't been a compelling reason to mess with it.

But just as the advent of more complex lighting and surface coloring models made the fixed function vertex and fragment shaders in the old pipeline led to the rise the current model, the needs of VR give us a reason to add programmability to the rasterizer.  

What we need is a way to take the rasterizers traditional output (a set of pixel coordinates) and displace them based on the required distortion.  

What would such a shader look like?  Well, first lets assume that the rasterizer operates in two separate steps.  The first takes the normalized devices coordinates (which are all in the range [-1,1] on both axes) and outputs a set of N values that are still in normalized devices coordinates.  The second step displaces the output of the first step based on the distortion function.

In GLSL terms, the first step takes three vec3 values (representing a triangle) and outputs N vec3 coordinates.  How many N depends on how much of the screen the triangle covers and also the specific resolution of the rasterization operation.  This would not be the same resolution as the screen for the same reason that we render to a larger than screen resolution texture in the current distortion method.  This component would remain in the fixed function pipeline.  It's basically the same as the graph paper example, but with a specific coordinate system.  

The second step would be programmable.  It would consist of a shader with a single vec2 input and a single vec2 output, and would be run for every output of the first step (the vec3's become vec2's because at this point in the pipeline we aren't interacting with depth, so we only needs the xy values of the previous step).  

in vec2 sourceCoordinate;
out vec2 distortedCoordinate;

void main() {
  // Use the distortion function (or a pre-baked structure) to 
  // compute the output coordinate based on 
  // the input coordinate
}

Essentially this is just a shader that says "If you were going to put this pixel on the screen here, you should instead put it here".  This gives the client the displace the pixels that make up the triangle in exactly the same way they would be displaced using the texture distortion method currently used, but without the cost of running so many extra pixels through the pipeline.  

Once OpenGL has all the output coordinates, it can map them to actual screen coordinates.  Where more than one result maps to a single screen coordinate, OpenGL can blend the source pixels together based on each's level of contribution, and send the results as a single set of attributes to the fragment shader.  

The application of such a rasterization shader would be orthogonal to the vertex/fragment/geometry/tesselation shaders, similar to the way compute shaders are independent.   Binding and unbind a raster shader would have no impact on the currently bound vertex/fragment/geometry/tesselation shader, and vice versa.  

Chroma correction

Physical displacement of the pixels is only one part of distortion correction.  The other portion is correction for chromatic aberration, which this approach doesn't cover.

One approach would be to have the raster shader output three different coordinates, one for each color channel.  This isn't appealing because the likely outcome is that the pipeline then has to run the fragment shader multiple times, grabbing only one color channel from each run.  Since avoiding running the fragment shader operations more than we have to is the whole point of this exercise, this is unappealing.

Another approach is to add an additional shader to the program that specifically provides the chroma offset for each pixel.  In the same way you must have both a vertex and a fragment shader to create a rendering program in OpenGL, a distortion correction shader might require both a raster and a chroma shader.  This isn't ideal, because only the green channel would be perfectly computed for the output pixel it covers, while the red and blue pixels would be covering either slightly more or slightly less of the screen than they actually should be.  Still it's likely that this imperfection would be well below the level of human perception, so maybe it's a reasonable compromise.

Issues

Cracks
You want to avoid situations where two pixels are adjacent in the raster shader but the outputs have a gap between them when mapped to the screen pixels.  Similar to the way we use a higher resolution than the screen for textures now, we would use a higher resolution than the screen for the rasterization step, thus ensuring that at the area of greatest magnification due to distortion, no two two adjacent input pixels cease to be adjacent when mapped to actual physical screen resolution

Merging
An unavoidable consequence of distortion, even without the above resolution increase is that pixels that are adjacent in the raster shader inputs will end up with their outputs mapping to the same pixel.  

Cost 
Depending on the kind of distortion required for a given lens, the calculations called for in the raster shader might be quite complex, and certainly not the kind of thing you'd want to be doing for every pixel of every triangle.  However, that's a fairly easy problem to solve.  When binding a distortion program, the OpenGL driver could precompute the distortion for every pixel, as well as precompute the weight for each rasterizer output pixel relative to the physical screen pixel it eventually gets mapped to.  This computation would only need to be done once for any given raster shader / raster raster resolution / viewport resolution required.  If OpenGL can be told about symmetry even more optimization is possible.  

You end up doing a lot more linear interpolation among vertex attributes during the rasterization state, but all this computation is still essentially the same kind of work the existing rasterization stage already does, and far less costly than a complex lighting shader executed for a pixel that never gets displayed. 

Next steps

  • Writing up something less off the cuff
  • Creating a draft specification for what the actual OpenGL interface would look like
  • Investigating a software OpenGL implementation like Mesa and seeing how hard it would be to prototype an implementation
  • Pester nVidia for a debug driver I can experiment with
  • Learn how to write a shader compiler
  • Maybe figure out some way to make someone else do all this


Wednesday, October 22, 2014

Video: Rendering OpenCV captured images in the Rift

In this video, Brad gives a walkthrough of an application that pulls images from a live Rift-mounted webcam and renders them to the display.


Links for this video: