What is Computer Vision
The super short answer: It’s what I study at Georgia Tech.
The longer answer: below…
As part of a program I’m involved in I needed to describe my research succinctly and from a very high level to forty of my colleagues. Many of the people who I was addressing did not have a technical background. I drafted the following short essay to describe what Computer Vision is, and what I do with it as of right now:
Computer Vision: A Summary
Sight, in my opinion, is the most incredible of our five senses. I conduct research in a field known as “Computer Vision.” This area focuses on the higher-level parts of a fascinating overall problem: Teach computers to see like people.
Teaching computers to see can be considered in three layers. The first is image acquisition. Here, I use the term image to mean data that a computer can interpret. This includes pictures like those from a camera, three-dimensional brain scans, and much more. These images are fantastic tools for people, and people alone. Computers cannot understand them without something more.
The next phase is image processing. This step makes the acquired images ‘nicer.’ This includes signal processing problems like removing noise, enhancing colors, and making important features more apparent. A processed image, though, still holds no meaning for the computer… only the humans that look at them.
The third layer is where my research is focused. The computer vision layer takes these processed images and assigns meaning to them. Three key activities here are segmentation, registration, and tracking. Segmentation involves finding the boundary of an object or objects of interest in a scene; registration is the process of lining up two images; and tracking is the process of determining the position of an object over time in a video sequence.
Knowing the shape, position, and motion of objects in images is vital to understanding those images. The state of the art in this field can already be used to solve real-life problems. A computer can examine structures in a 3D image of a brain to find their shape so doctors can examine them. A robot can align images that it captures from on-board cameras to make a complete map of the area it is. A surveillance system can automatically track a suspicious vehicle as it drives past.
These are just a few of the many uses of this type of technology. Computer vision technology has improved by leaps and bounds in recent years, but when we compare it to the ability of our own human vision system it is still found very wonting. My research is focused on improving the capabilities of computer vision systems. The techniques I use to do this are very mathematical, and rooted firmly in optimal control theory.
My primary research focus and publication record has been in medical imaging, but in the last year it has shifted to include visual tracking as well. My initial projects involved segmentation of small structures in MRI images of the brain. While solving these problems, I developed a powerful technique whose applications I am still researching and publishing. I also developed an algorithm that allows cardiologists to examine the interior walls of blood vessels in ways not before possible by registering two types of imagery automatically according to physical landmarks.
More recently, I have also expanded my attention to include visual tracking problems. The ability to follow objects in videos is a challenging task, and I have spent the last six months learning the state of the art in this area and making contributions. I have developed systems that are capable of tracking objects that move quickly, move off-camera, and objects that change drastically in appearance. These problems require the investigation of new techniques and approaches and remain an exciting area of study.
One of the most fascinating parts about computer vision is the wide applicability of the techniques and solutions. It is the versatility of this field that originally drew me in, and that continues to keep me interested.