Peripheral-Foveal Vision for Real-time Object Recognition and Tracking in Video
Human object recognition in a physical 3-d environment is still far superior to that of any robotic vision system. We believe that one reason (out of many) for this—one that has not heretofore been significantly exploited in the artificial vision literature—is that humans use a fovea to fixate on, or near an object, thus obtaining a very high resolution image of the object and rendering it easy to recognize. In this paper, we present a novel method for identifying and tracking objects in multi-resolution digital video of partially cluttered environments. Our method is motivated by biological vision systems and uses a learned “attentive” interest map on a low resolution data stream to direct a high resolution “fovea.” Objects that are recognized in the fovea can then be tracked using peripheral vision. Because object recognition is run only on a small foveal image, our system achieves performance in real-time object recognition and tracking that is well beyond simpler systems.