3D Vision with Stereo Disparity
2D is nice, but these days I’m getting interested in doing computer vision in 3D. One way to get 3D data is to use two cameras and determine distance by looking at the differences in the two pictures (just like eyes!). In this project I show some initial results and codes for computing disparity from stereo images.
Introduction
UPDATE: Check this recent post for a newer, faster version of this code. The new version no longer relies on mean-shift.
People can see depth because they look at the same scene at two slightly different angles (one from each eye). Our brains then figure out how close things are by determining how far apart they are in the two images from our eyes. The idea here is to do the same thing with a computer. Check this for some information on the geometry and mathematics of stereo vision. First, here are the images I’ll use to show results.
These two images are slightly different. The top one is from the left and the bottom is from the right. It’s a bit hard to see the disparity like this, so here are the same two images placed “on top” of one another.
You can see that the close-up objects like the lamp are very misaligned in the two images, while the farther-away things like the poster and the camera are lined up better. The greater the misalignment, the closer the object.
This pair of images is one of many standard stereo pairs that can be found at the Middlebury stereo vision site. These guys keep a compendium of standard datasets as well as a scoreboard of who’s algorithms work the best. The algorithm I talk about here is a knock-off of the one that was on top in December 2007: “Segment-Based Stereo Matching Using Belief Propogation and a Self-Adapting Dissimilarity Measure[PDF]” by Klaus, Sormann, and Karner. (Mind that the algorithm here is *inspired* by the algorithm of Klaus et al. Theirs is much more complete)
Getting Pixel Disparity
The first step here is to get an estimate of the disparity at each pixel in the image. A reference image is chosen (in this case, the right image), and the other image slides across it. As the two images ’slide’ over one another we subtract their intensity values. Additionally, we subtract gradient information from the images (spatial derivatives). Combining these gives better accuracy, especially on surfaces with texture. In the video below, we can see a visualization of this process. You’ll notice how far-away objects go dark (meaning they line up in the two images) at different times than close-up objects. We record the offset when the difference is the smallest as well as the value of the difference.
We perform this slide-and-subtract operation from right-to-left (R-L) and left-to-right (L-R). Then we try to eliminate bad pixels in two ways. First, we use the disparity from the R-L pass or the L-R pass depending on which has the lowest matching difference. Next, we mark as bad all points where the R-L disparity is significantly different from the L-R disparity. Finally, we are left with a pixel disparity map.
In this image, red-er colors indicate closer pixels, and blue-er colors represent pixels that are farther away.
Filtering the Pixel Disparity
In the next step, we combine image information with the pixel disparities to get a cleaner disparity map. First, we segment the reference image using a technique called “Mean Shift Segmentation.” This is a clustering algorithm that “over-segments” the image. The result is a very ‘blocky’ version of the original image.
Then, for each segment, we look at the associated pixel disparities. In my simple implementation, we assign each segment to have the median disparity of all the pixels within that segment. This gives the final result:
Here again, the red colors are close objects, and blue objects are far away.
Matlab Code
I spent some time getting these simple ideas into working form. I’ve posted the codes and images I used as well as a demo script. To see how the code works, simply un-archive everything and run demo from a Matlab prompt. Enjoy, and let me know how these work for you.
stereo_modefilt.zip (New! Described here.)
lankton_stereo.tar.gz(Old)
Check out the newest results:
Conclusion
This stereo algorithm is just a tool to be used on other projects. For instance, by computing the stereo disparity of a stereoscopic video it is possible to improve tracking results by using the 3D information. Also, segmentations can be made more accurate if 3D information is known.
I wanted to put this up to introduce people to stereo vision (as this was my introductory project). Hopefully, the words and the codes above will save you some time getting up to speed. I would ask that if you find this helpful and make improvements that you let me know!
There are sometimes problems using the mean shift segmentation code (which is written in C++/mex). Try the codes in this related post which are similar, but may compile better.
Hi Shawn,
Only the above couple samples performance is outstanding.
Tried other middleburry stereo vision pairs image. but the outcome not the expected. Any suggestion?
Brian
There are a few things you need to watch out for:
1) Make sure the left and right images aren’t swapped. If so you’ll get garbage.
2) You need to set the maxs variable correctly. This variable is the maximum shift between the two images.
3) The disparity between the images can’t be too large. Best results are when the max shift is between 20 and 40 pixels.
Hope that helps!
-Shawn
Shawn,
The stereo file in your zip folder states the following:
“The output here is pixel disparity, which can be converted to actual distance from the cameras if information about the camera geometry is known.”
I am a bit confused as to how you would actually convert the disparity map into distance information. Is there any way you could help shine some light in this? Thanks!
Dear Shawn.
I am working in project that involves the acquistion of deepth values from a 2D image. The specific problem is that right and left images are acquired in the same frame (through a prism and two mirrors). I am having some problems applying this code to my images. I would like to know what is your oppinion about this.
Thanks
Your work is nice, but I think you did not use your own camera images? To answer some of the questions most people have beginning with their own stereo camera (as I did), I have made available my first steps here:
http://grauonline.de/wordpress/?p=5
Hopefully, this might help others to get started with using your own stereo camera :-)
good work!
after lots of searching and reading, i found that there are lots of research about disparity map generating and 3d reconstruction from sparse or dense match, respectively, but i can not find even one example of whole processing of generating 3d model from disparity mapping to reconstruction.
as image pair above, we could get disparity map, but could not get 3d model for the camera metrix is unknown.
have u finished the whole processing? what about the result of reconstruction from disparity map?
Dear Shawn:
I am doing my master thesis of disparity estimation , currently testing your code, I want to consult you which part of your code is doing the plane fitting step and plane extracting step accordint to your reference paper?
thanks,
Xin
hey, Shawn, do you have the ground-truth to this testing image, for the real cost computation purpose.
thanks,
xin
The ground truths for these images are available on the Middlebury site.
thanks shawn, recently I try to use the polynomial model to model the plane fitting step, I find order 1 model will get better disparity estimation but order 2 is worse, I am looking for the reason, I judge the quality by calculating the PSNR quantity.
hey Shawn, do you have journal paper or other publication describing more specific details about your matlab codes’ algorithm, I find not so much details been discussed in your paper on this website. thanks
hi Shawn,
I want to ask in your codes, after this sentence:
pixel_dsp = winner_take_all(disparity1, mindiff1, disparity2, mindiff2);
there is only one disparity map output as pixel_dsp, is this variable means the disparity from right image to left image? that it tells how the pixel in right image goes to left image , right?
And you have not performed L-R checking to exclude the outliers in this step right? just choose more reliable disparity value from disparity1 and disparity2 matrix depending on mindiff value for each pixel , right?
so if I want to reconstruct the right image (i1) in a backward way, I could use pixel_dsp and i2 to reconstruct the i1 and check their MSE in PSNR quantity to say how does the disparity estimation codes perform , right?
And the last question, have you some suggestions about how to exclude the outliers in the final disparity map?
thanks a lot for your consult,
xin
hi Shawn,
I want to ask you about your code.
in your code what is distance between cameras and how i can change it please.
I get an unhandled exception using edison_wrapper. Anyone else experiencing this?
MATLAB Version: 7.7.0.471 (R2008b)
Operating System: Microsoft Windows Vista
Window System: Version 6.0 (Build 6002: Service Pack 2)
Processor ID: x86 Family 6 Model 7 Stepping 6, GenuineIntel
Virtual Machine: Java 1.6.0_04 with Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode
Default Encoding: windows-1252
Vista x64 might be the problem, but I have recompiled the wrapper.
Any help is appreciated. Thanks in advance.
hi Shawn
thank u for these code i want ask u
i want to know how is the code is execute and how many and what are the stages to get resultting image?
what the purpose from each code
why most of the codes refer to msseg
to run the codes please untar the files, open Matlab, and navigate to the lankton_stereo directory. Then run
>>demo
Note that many people have had problems running the mean shift segmentation codes (which I did not write). For equally satisfactory results (in my opinion), try the codes found here: Fast Stereo Disparity Code.
hey Shawn,
Thankx a lot for your code. Could you tell me how to proceed after your code, if I aim at making a 3D reconstructed image once I know the stereo disparity.
Hi Shawn, thank you very much for your work!!! I’d like to save the disparity map in the working directory to process it for making a 3D reconstruction of an image. I’ve used the command imwrite(d,’image,jpg’) and it didn’t work properly.. any ideas? thank you in advance
I would try saving using the ’save’ command. This will save a binary copy of the data to your hard drive. The ‘imwrite’ command will save an image (in whatever format), but this will alter the disparity data stored in the stereo output. The binary version should preserve the original data exactly. Use:
>>help save
>>help load
to learn how to use this command.
Hi Shawn, how to converted pixel disparity to actual distance from the camera geometry?
Hai Shawn, can u explain more about ur stereo m file, line 59, “pd = shift_image(pixel_dsp,5); “. Why the shift value is 5?
@Philip. This line is kind of a hack. It just centers the disparity image between the two reflectance images.
@Philip. It is not too complicated. Check out the book, “Multiple View Geometry in Computer Vision” by Richard Hartley, Andrew Zisserman.
hi shawn , thank you about your effort, it is nice work.
i execute the stereo_nofilter function, i need your help to understand why are you used gradient function in your program ? what are you mean by this line?
d = CSAD+weight*CGRAD;
i wrote aprogram to compute the disparity map but by another technique it depend on using window 3×3 or 5×5 or 7×7 and calculate the disparity for the center pixel in this window , but you calculate the disparity for all image!!!!
emy
sorry but i have another question , you say above, First, we segment the reference image using a technique called “Mean Shift Segmentation” , but i linked to mean shift segmentation but i cann’t find the matlab code.
emy
@emy
That dissimilarity is inspired by the Klaus, Sormann and Karner paper. They use a combination of sum of absolute differences, and gradient features to produce reliable disparities. The SAD measure is good if you assume that the surface is Lambertian, and the gradient features are good for surfaces that aren’t subject to Lambertian properties, but they have poor discriminating power. So these authors combined both of these similarity measures together to produce a final dissimilarity function, and they use local matching methods with this function to compute a disparity map.
Some of the things in the paper that are key elements to the algorithm they conveniently left out… such as the disparity plane fitting. There’s a lot more math going on than in those two paragraphs they wrote, and it’s kind of disconcerting that they left it out.
Also, the mean-shift code is there. You have to go to Shai Bagon’s website and download the MATLAB wrapper. Shawn didn’t implement the code, but used the MATLAB wrapper instead. He also wrote a wrapper on top of the wrapper to make things easier.
- Ray.
@ Ray
thank you so much about your interesting and your answers
i goto Shai Bagon’s website and download the MATLAB wrapper but some functions undefind like featurefun , im2single so, i have many error i’m using MATLAB Version 7.0.0.19920 (R14), can you help me??
and i have another problem i can’t deal with mex after setup!!!!!
regards
emy
have you idea about 3d view using matlab?
@emy
I’m running MATLAB 7.8.0 (R2009a) and the function exists in my version… you probably have an outdated version of MATLAB. I know R14 was a version of MATLAB as of around 2005. I kept on getting run-time errors and it wouldn’t start when I tried installing it on my PC! Try typing in help im2single to see if the function exists for you. The function converts the image to single precision.
When you run mex -setup, make sure you choose either the Microsoft Visual C++ compiler in Windows, gcc in Linux, or mexopts in the MACOSX environment. Their default compiler doesn’t seem to work very well.
As for 3DView, I’m not too familiar with it. What is it supposed to do?
I got the code working pretty much. However, is there any way that I can convert to modefilt C++ code to MATLAB code?
@Sodrohu
The code for modefilt is straightforward and should be easy to write in Matlab, but it would slow down the execution considerably
I used your code Matlap 3D Stereo Disparity in order to calculate the disparity of my pictures.
Your demo is great, but when I use my pictures, I have problem.
Your tsuL are tusR.png 288×384x3 and my pictures are 383×512.
Matlap tells me:
dsp = stereo_nofilter (img_R, img_L, 20);
Error in ==> gradient at 49
[Msg, f, ndim, loc, rflag] = parse_inputs (f, varargin);
?? Output argument “varargout (3)” (and maybe others) not assigned during call to “C: \ Software \ MATLAB \ toolbox \ matlab \ datafun \ gradient.m> gradient”.
Error in ==> stereo_nofilter> slide_images at 63
[G1X g1y g1z] = gradient (double (i1))% – get gradient for each image
Error in ==> stereo_nofilter at 38
[DSP1, diff1] = slide_images (i1, i2, 1, MAXS win_size, weight);
—————-
how can I do?
Thanks for your help.