Google Cardboard Camera to Youtube 3D Panorama

Google recently released a pretty neat Android app called Cardboard Camera that uses a single smartphone camera to take 3D panoramic images. It is a damn interesting little piece of tech, and the 3D images it takes are of excellent quality – especially considering that its taken with a single lens – but unfortunately is very lacking in shareability.

As of 1/29/16, one must use a file browser to pull out the image (with an extension of .vr.jpg), send it via email or other transfer method. That isn’t all that difficult, except then to actually view it the receiving user must place it in /DCIM/CardboardCamera on their Android device, where it can only be viewed in the Cardboard Camera app. This is a huge barrier to sharing these images (especially with our iOS and non-mobile brethren), so I sought out to fix that problem.

What is a vr.jpg file made up of? It is essentially three files:

  1. left eye .jpg of the panorama taken with Cardboard Camera
  2. right eye .jpg of same
  3. ambient audio .mp4


Thanks to Andrew Perry and his work on analyzing and extracting this data (and his web app) we can easily pull these pieces out and work with them.

What kind of video is a YouTube panorama? Their guidelines are fairly loose:

We recommend uploading 16:9 aspect ratio stitched videos in 4K (3840×2160) resolution.

Since these will be 3D panoramas, we’ll be putting the left- and right-eyed views on top of one another (for some reason I couldn’t get side-by-side to work on Youtube), meaning what we need in the end is a 3840×4320 video.

Right off the bat, we run into an issue with this – Cardboard Camera doesn’t actually include anything above or below the field of view of your camera when you take the image. In the Cardboard Camera app, this is dealt with by (what looks like) blurring the top and bottom of the images:

blurred top

This blurry portion is then stretched to the top and bottom of your field of view, which ends up looking pretty good in a Google Cardboard or similar VR viewer…

In order to replicate this for our 3840×2160 image, we simply scale each with a horizontal resolution of 3840, then stretch the blurry portion at the top and bottom to build a “complete” image:


We do this to both the left and right versions, then stack them (with the left view on top):


Lastly, we need to combine this image with the .mp4 audio file. This actually proved trickier than I would have liked – there are not very many free video editing programs that will let you create a video with a resolution of 3840×4320 (and the ones that can were complicated as hell). So, I turned to a very powerful command line audio and video editing tool, ffmpeg.

Who needs fancy video editing software when combing an image and audio file into a video is as easy as running a single command? Don’t ask me what all that means, but the gist is “take image.jpg and combine it with audio.mp4 to produce video.mp4”

ffmpeg -loop 1 -i image.jpg -i audio.mp4 -c:v libx264 -tune stillimage -c:a aac -strict experimental -b:a 192k -pix_fmt yuv420p -shortest video.mp4

With that, we’re almost done! Run the resulting video through Google’s 360 Youtube Metadata app (or do it via Python script if you want to get fancy) and upload it! I’ve found that you need to wait about 30 minutes to an hour before the panoramic part starts working, and will probably need to turn on the 3D flag manually in your video’s advanced settings.

Voila! The best part about doing it through Youtube is that the video can be viewed from practically anywhere – embedded here it should just be a normal 360 video, but if you view it in Google Cardboard, Gear VR, or other 3D viewer, you get a 3D 360 view!