2D is nice, but these days I’m getting interested in doing computer vision in 3D. One way to get 3D data is to use two cameras and determine distance by looking at the differences in the two pictures (just like eyes!). In this project I show some initial results and codes for computing disparity from stereo images.

Introduction

UPDATE: Check this recent post for a newer, faster version of this code. The new version no longer relies on mean-shift.

People can see depth because they look at the same scene at two slightly different angles (one from each eye). Our brains then figure out how close things are by determining how far apart they are in the two images from our eyes. The idea here is to do the same thing with a computer. Check this for some information on the geometry and mathematics of stereo vision. First, here are the images I’ll use to show results.

These two images are slightly different. The top one is from the left and the bottom is from the right. It’s a bit hard to see the disparity like this, so here are the same two images placed “on top” of one another.

You can see that the close-up objects like the lamp are very misaligned in the two images, while the farther-away things like the poster and the camera are lined up better. The greater the misalignment, the closer the object.

This pair of images is one of many standard stereo pairs that can be found at the Middlebury stereo vision site. These guys keep a compendium of standard datasets as well as a scoreboard of who’s algorithms work the best. The algorithm I talk about here is a knock-off of the one that was on top in December 2007: “Segment-Based Stereo Matching Using Belief Propogation and a Self-Adapting Dissimilarity Measure[PDF]” by Klaus, Sormann, and Karner. (Mind that the algorithm here is *inspired* by the algorithm of Klaus et al. Theirs is much more complete)

Getting Pixel Disparity

The first step here is to get an estimate of the disparity at each pixel in the image. A reference image is chosen (in this case, the right image), and the other image slides across it. As the two images ‘slide’ over one another we subtract their intensity values. Additionally, we subtract gradient information from the images (spatial derivatives). Combining these gives better accuracy, especially on surfaces with texture. In the video below, we can see a visualization of this process. You’ll notice how far-away objects go dark (meaning they line up in the two images) at different times than close-up objects. We record the offset when the difference is the smallest as well as the value of the difference.

We perform this slide-and-subtract operation from right-to-left (R-L) and left-to-right (L-R). Then we try to eliminate bad pixels in two ways. First, we use the disparity from the R-L pass or the L-R pass depending on which has the lowest matching difference. Next, we mark as bad all points where the R-L disparity is significantly different from the L-R disparity. Finally, we are left with a pixel disparity map.

In this image, red-er colors indicate closer pixels, and blue-er colors represent pixels that are farther away.

Filtering the Pixel Disparity

In the next step, we combine image information with the pixel disparities to get a cleaner disparity map. First, we segment the reference image using a technique called “Mean Shift Segmentation.” This is a clustering algorithm that “over-segments” the image. The result is a very ‘blocky’ version of the original image.

Then, for each segment, we look at the associated pixel disparities. In my simple implementation, we assign each segment to have the median disparity of all the pixels within that segment. This gives the final result:

Here again, the red colors are close objects, and blue objects are far away.

Matlab Code

I spent some time getting these simple ideas into working form. I’ve posted the codes and images I used as well as a demo script. To see how the code works, simply un-archive everything and run demo from a Matlab prompt. Enjoy, and let me know how these work for you.

[red indicates close, blue indicates far away]

Conclusion

This stereo algorithm is just a tool to be used on other projects. For instance, by computing the stereo disparity of a stereoscopic video it is possible to improve tracking results by using the 3D information. Also, segmentations can be made more accurate if 3D information is known.

I wanted to put this up to introduce people to stereo vision (as this was my introductory project). Hopefully, the words and the codes above will save you some time getting up to speed. I would ask that if you find this helpful and make improvements that you let me know!

91 thoughts on “3D Vision with Stereo Disparity”

1. Anonymous says:

@Shawn Lankton
thank you very much sir
iam using 70mm distance between two cameras . the basic formula for calculating the distance is distance=(focul length*camera baseline)/disparity but in your disparity code which one is the actual disparity in pixels and how to calculate focal length .my major doubt is in your disparity code which variable represents (dsp pixel_dsp or any other ) the actual disparity in pixels and tell me the matlab code for the above expression

2. Anonymous says:

@nagarjuna
Dear friend, my work is similar to yours , i will be so thankful if you help me with your final results in measuring the depth of an specified object from 2d images

3. KA says:

I rectified the images, and then fed the rectified images to the 3D stereo disparity algorithm. But on the filtered output I am not seeing the outline of the objects in the image (I am seeing noise). But the nearest object is hotter than those further away, but no object outline is visible.

What can be done?
Thanks again

4. @KA
Perhaps you could email me one of the disparity outputs you’re getting so I can see what you’re dealing with.

5. Hmanshu says:

hello shawn….
I am working on project 3D object recognition using stereovision.
I am having problems in getting the disparity map. can u please tell me how to select the maximum disparity value. Is their some formula to get it.
And one thing, what should be the maximum distance between the two cameras while making a stereo vision system….
thank u….

6. Anonymous says:

@Hmanshu A dirty but useful way would be to go find “Template Matching” code. You can not only use it to rectify any vertical translation, but you can allow the code to rectify in the horizontal direction. That horizontal translation in pixels could be used for your max disparity. Keep in mind, you do not want to actually use the images that were rectified in the horizontal as your input images to Lankton’s code.. or any disparity algorithms.

7. Anonymous says:

@Hmanshu Same anonymous poster. Actually, a much easier way I just remembered is to look at an image and find the closest point to the camera. Record the pixel x dimension. Now find that same point in the second image and record its same x dimension. The difference of the two is disparity. This is the core concept of ‘disparity’, but since it is the closest to the camera, that will be the highest disparity value you can get.

8. XiasiLiang says:

Dear friends,i am studying thetopic how to getting the depth from two defocus images,and i don’t know how to compute the depth of each pixel from the depth map, i hope you can give me some advise about how to compute the depth from the depth map ,thank you very much!

9. Vishnu says:

Hi, I’m getting an error “Undefined function or method ‘modefilt2_mex’ for input arguments of type ‘double'” when I run this program. Any suggestions ?

10. Vimal says:

Hi vishnu.. i am facing the almost same problem. Can you tell me how to open the modefilt2_mex file

@Vishnu

11. Vimal says:

Hi….how can i open mex file??

12. Tejal says:

Hi!
can we do the same job using one image?
i.e. monocular depth perception

can you help me out in saparating out different planes in the image using edge and monocular cues?

13. Vimal says:

Thanks for quick response. Can we do median filter instead of the MEX file?

14. Vimal says:

Hi shawn…. have you done image segmentation while processing?
If yes can you tell the algorithm?

15. Vimal says:

@Vimal
With one image,,, is this possible?
Even the result with stereo pair is worse with some image.
All the best

16. Vimal says:

Hi shawn some picture work perfectly but other pictures not. I calibrated that 20 pixel is the maximum disparity still not working. Can i get another pair of images?

17. taozi says:

Hi Shawn, I am a chinese student,sorry for my bad english, i hope that you can understand it. I have read your WebSite recently, I am trying to get the depth map from a real pair L-R images using the binocular vision stereo matching algorithm,I have download you matlab code and run it ,bu when i use it to another picture pairs, it doesn’t work perfectly,the image was download from the website,
http://vision.middlebury.edu/stereo/,even the same Tsukuba image, it doesn’t work the same, even couldn’t see the outlines, i don’t know why, can you tell me where is the difference? Now i really want to do my subject well, but i have no ideo, i don’t what should i do? I really hope that you can help me, when i can get the disparity map, if i want to get the depth map, is the camera focus and the baseline distance i need know, how could i convert these data to pixel,you know,the focus and the baseline is measured by meter, but the disparity is measured by pixel,how could i convert them? and there is another data that i need to know? or something must be done before i could get the depth map. Best regards. thaks again.

18. Poo says:

I were trying to run code on the lankton_stereo but i can not complie edison_wrapper_mex (mac). Can you help me pls?

19. Don says:

Hi Shawn,

I too am trying to compile this code but am having trouble. I have located the updated website for EDISON:

http://coewww.rutgers.edu/riul/research/code/EDISON/

and put the segm\ and edge\ directories where the compile_edison_wrapper.m expects them to be.

I was getting an error about mac vs windows file format… I’m on windows and I think I needed to change the “/” to “\” so I did:

mex -O edison_wrapper_mex.cpp …
segm\ms.cpp segm\msImageProcessor.cpp segm\msSysPrompt.cpp segm\RAList.cpp segm\rlist.cpp …
edge\BgEdge.cpp edge\BgImage.cpp edge\BgGlobalFc.cpp edge\BgEdgeList.cpp edge\BgEdgeDetect.cpp

Also in BgEdgeDetect.h, I changed the “/” to “\” in the #include statements.

I still get this error from Matlab:

edison_wrapper_mex.cpp
c:\users\donald j. natale iii\desktop\lankton range image\lankton_stereo\msseg\edge\BgEdgeList.h : error C4335: Mac file format detected: please convert the source file to either DOS or UNIX format
edison_wrapper_mex.cpp(134) : warning C4018: ‘<' : signed/unsigned mismatch
edison_wrapper_mex.cpp(168) : warning C4018: '<' : signed/unsigned mismatch
edison_wrapper_mex.cpp(179) : warning C4018: '<' : signed/unsigned mismatch

I could really use advice or maybe a smack upside the head… am I doing something silly here?

-D

20. Don says:

OH I got it to work.

So what I did was:
1. change all of those slashes to match windows filename convention
2. open all of the source files in visual studio. for many of them it a message pops up about some kind of file format inconsistency, do I want it fixed. Click yes, save file, exit.
3. in line 386 of BgEdgeDetect.cpp, the “pow” command is used and my compiler complains that it is ambiguous because it doesn’t know what type the numbers are inside it. So change the line from this:

w = pow(2,(-2*WL_))*factorial(2*WL_)/(factorial(WL_-i)*factorial(WL_+i));

to this:

w = pow(2.,(-2*WL_))*factorial(2*WL_)/(factorial(WL_-i)*factorial(WL_+i));

notice there is a decimal point added after the first “2” . Now it knows to not treat 2 as an integer or whatever.

it now compiles successfully in windows using the compile_edison_wrapper_mex.m script in matlab!

-D

21. Don says:

Hey dude, I was having compiler trouble in windows. One of the non-windows specific things I had to do was in line 386 of BgEdgeDetect.cpp, the “pow” command is used and my compiler complains that it is ambiguous because it doesn’t know what type the numbers are inside it. So change the line from this:

w = pow(2,(-2*WL_))*factorial(2*WL_)/(factorial(WL_-i)*factorial(WL_+i));

to this:

w = pow(2.,(-2*WL_))*factorial(2*WL_)/(factorial(WL_-i)*factorial(WL_+i));

notice there is a decimal point added after the first “2? . Now it knows to not treat 2 as an integer or whatever it was thinking.

If you haven’t tried this yet maybe it will help!

-D

22. Don says:

Ugg… I spoke too soon. When I run the demo.m script it segfaults…

:^(

-D

23. villett says:

Dear Shawn,

Im working on the depth map estimation by using motion parallax, I got the motion vectors(x and y) with matlab. I’m having some problem on getting the absolute value of the horizontal component(Mvx) to get the disparity and also depth map. could you help me with this?

hi, please i need help, when i run demo, i get this error,

Error in ==> modefilt2 at 34
f = modefilt2_mex(img,win,ignore);

Error in ==> stereo at 48
fdsp = modefilt2(dsp,[win_size,win_size],2);

Error in ==> demo at 10
[d p] = stereo(i1,i2, maxs);

Thanks

i have window7 64 bit

26. Shane says:

Hi Shawn,

Just wondering if you could tell me which image is used as the reference image. Is it the right or left?

Thanks

27. Doaa says:

hey shawn,
thanks alot for ur code.kindly,u said u applied a median filter to the labels, so i want to ask what shall i do if i want to apply 1st order plan to the labels or 2nd order plan ….. and also i have a bad results when i apply the code to Teddy, cones datasets even i change the maxs parameter ….. waiting ur reply plz :)

Doaa

28. Doaa says:

@xin wang
hey,
me too, doing my MSc on the estimation of a dense disparity map, and i try to do a plan fitting to the labels …. can u help me how to do that with matlab :)

29. xglgkai says:

hi shawn?
do you have any new methods about stereo matching and matlab codes?I hope can learn from you and discuss the new method.

30. kuoyenlo says:

Hi Don,

31. James Warhola says:

Is there any way to get rid of fixed window error used in this method.

32. Anonymous says:

Hi there,
I get the following error after running demo:

Invalid MEX-file ‘/MATLAB/stereo_modefilt/modefilt2_mex.mexglx’: libstdc++.so.5: cannot open shared object file: No such file or directory

Any idea how to solve the problem?

33. Rasoul says:

Hi there,
I get the following error after running demo:

Invalid MEX-file ‘/MATLAB/stereo_modefilt/modefilt2_mex.mexglx’: libstdc++.so.5: cannot open shared object file: No such file or directory

Any idea how to solve the problem?

34. Pallav Garg says:

Hi,
Thanks for the page.
For tsubuka images it work perfectly.This matlab code is giving random disparity map for the standard images like cones, venus, and vaso.

35. Gintoki says:

Hello, Shawn. Thanks for your code! But there is a result puzzled me. When I swap the ‘i1(right)’ and ‘i2(left)’, the disparity map becomes quite different and may not be useful. Could you tell me why? Thanks~

36. Chetan Mehra says:

Hello, Shawn. I got the disparity of stereo Images. What other processes can I apply on that Disparity Matrix such that I get better result?

37. Anshul kumar says:

if i would have alraedy the disparity image than how can i get the number of pixel from the diaparity image. disparity image is like the other image a matrix whose entries are the intensity of image location.@Anonymous

38. hi Mr. Shawn, my name is bedros and i am trying to buy a fast 3d stereo vision software, can you help me . i am trying to print my 3d photos from fujifil 3d camera. thank you

39. Anonymous says:

how to solve this error :

Undefined function ‘modefilt2_mex’ for input arguments of type ‘double’.

Error in modefilt2 (line 34)
f = modefilt2_mex(img,win,ignore);

Error in demo (line 14)
modefilt = modefilt2(p,[5 5]);

40. bubbles says:

Hello Shawn,

How is Stereo Matching done with convolutions??

Thanks

41. Hi
can anyone help me by sending the stereo rectification code with your images.
because in my code am not getting exact left rectified image i thought its because of left H matrix.