The change to the model-view matrix is fairly easy to understand. The model-view represents the camera transformation, and when you're rendering in stereo you basically want two camera positions, corresponding to the left and right eye. If you look into the SDK, you see that the amount you translate corresponds to the configured interpupillary distance, or IPD. This makes sense, because the IPD is a configurable value based on a person's actual physical characteristics.
The projection matrix transform is a bit harder to explain. The projection matrix is created by taking the desired field of view, the aspect ratio, and the near and far clipping planes in order to create the view frustum.
Ignoring the near and far clipping planes for a moment, the easiest way to conceptualize the view frustum is to think of sitting in a room with a window to the outside world. (Assume that you're looking straight through the window, for now.) If you were sitting outside, you'd be able to see the entire environment, but if you're inside and looking through a window, that window boxes what you can see. The size of the window corresponds to the 'field of view' component of a projection matrix, i.e. how many degrees you can see in each direction. The aspect ratio of the window corresponds to the aspect ratio of the projection.
That's all well and good, but what happens when things start to shift (translate)? Well, imagine if the window could move around within the wall. If the window moved a little bit to the left, you'd be able to see more of the outside world in that direction, but less of it to the right. That's what the projection matrix's translation component does.
This is important, because the view from the Rift is not symmetrical. If you plot out a diagram of the Rift you can see that there's more screen space to the left of the left eye than to the right of the left eye.
Those tick marks aren't arbitrary. They're at 30, 45 and 60 degree angles from the center of the view. You can clearly see that while the top and bottom show the 60 degree mark at the very edge of the screen, indicating a full vertical field of view of about 120 degrees, the tick marks on the inward edges of the each eye barely show the 45 degree mark, but the outward measure goes well past the 45 degree mark, and in fact it's almost to the 60 degree mark.
This has the effect of shifting the contents of the scene 'inward', towards the center of the display. This is critical, because without this effect, the cubes would each by centered in their respective viewports, rather than centered directly under the lens axis. Both of these effects amount to the same thing: correcting the rendered scene to account for the fact that your rendering target is asymmetrical on the horizontal axis. (A common trait amongst binocular humans.)
It's important to understand this asymmetry, especially if you intend to support multiple kinds of stereo output in your application, or are porting an existing application that already has stereo support.
The model-view translation is required for any kind of stereo rendering, but if you intend to produce more conventional stereoscopic images, or support something like side-by-side stereo output for a standard 3D monitor, you must not include the binocular projection translation.
Unfortunately, the existing OculusSDK examples don't demonstrate this. They provide you with the ability to render without the distortion (by hitting the F2 key in OculusWorldDemo or OculusRoomTiny) but they continue to modify the projection matrix, so the side by side stereo image isn't really suitable for anything. It doesn't have the distortion required by the Rift, and it won't display correctly on an app or piece of hardware designed to present side-by-side stereoscopic images in 3D.
Coming soon: A demonstration of presenting pre-existing conventional stereoscopic content on the Rift.