Saliency Driven Object recognition in egocentric videos with deep CNN

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

The problem of object recognition in natural scenes has been recently successfully addressed with Deep Convolutional Neuronal Networks giving a significant break-through in recognition scores. The computational efficiency of Deep CNNs as a function of their depth, allows for their use in real-time applications. One of the key issues here is to reduce the number of windows selected from images to be submitted to a Deep CNN. This is usually solved by preliminary segmentation and selection of specific windows, having outstanding "objectiveness" or other value of indicators of possible location of objects. In this paper we propose a Deep CNN approach and the general framework for recognition of objects in a real-time scenario and in an egocentric perspective. Here the window of interest is built on the basis of visual attention map computed over gaze fixations measured by a glass-worn eye-tracker. The application of this set-up is an interactive user-friendly environment for upper-limb amputees. Vision has to help the subject to control his worn neuro-prosthesis in case of a small amount of remaining muscles when the EMG control becomes unefficient. The recognition results on a specifically recorded corpus of 151 videos with simple geometrical objects show the mAP of 64,6\% and the computational time at the generalization lower than a time of a visual fixation on the object-of-interest.

Related collections

Most cited references 17

Record: found
Abstract: found
Article: not found

Region-Based Convolutional Networks for Accurate Object Detection and Segmentation.

Ross Girshick, Jeff Donahue, Trevor Darrell … (2016)

Object detection performance, as measured on the canonical PASCAL VOC Challenge datasets, plateaued in the final years of the competition. The best-performing methods were complex ensemble systems that typically combined multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 50 percent relative to the previous best result on VOC 2012-achieving a mAP of 62.4 percent. Our approach combines two ideas: (1) one can apply high-capacity convolutional networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data are scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, boosts performance significantly. Since we combine region proposals with CNNs, we call the resulting model an R-CNN or Region-based Convolutional Network. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

0 comments Cited 352 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Targeted muscle reinnervation for real-time myoelectric control of multifunction artificial arms.

Todd Kuiken, Guanglin Li, Blair A. Lock … (2009)

Improving the function of prosthetic arms remains a challenge, because access to the neural-control information for the arm is lost during amputation. A surgical technique called targeted muscle reinnervation (TMR) transfers residual arm nerves to alternative muscle sites. After reinnervation, these target muscles produce electromyogram (EMG) signals on the surface of the skin that can be measured and used to control prosthetic arms. To assess the performance of patients with upper-limb amputation who had undergone TMR surgery, using a pattern-recognition algorithm to decode EMG signals and control prosthetic-arm motions. Study conducted between January 2007 and January 2008 at the Rehabilitation Institute of Chicago among 5 patients with shoulder-disarticulation or transhumeral amputations who underwent TMR surgery between February 2002 and October 2006 and 5 control participants without amputation. Surface EMG signals were recorded from all participants and decoded using a pattern-recognition algorithm. The decoding program controlled the movement of a virtual prosthetic arm. All participants were instructed to perform various arm movements, and their abilities to control the virtual prosthetic arm were measured. In addition, TMR patients used the same control system to operate advanced arm prosthesis prototypes. Performance metrics measured during virtual arm movements included motion selection time, motion completion time, and motion completion ("success") rate. The TMR patients were able to repeatedly perform 10 different elbow, wrist, and hand motions with the virtual prosthetic arm. For these patients, the mean motion selection and motion completion times for elbow and wrist movements were 0.22 seconds (SD, 0.06) and 1.29 seconds (SD, 0.15), respectively. These times were 0.06 seconds and 0.21 seconds longer than the mean times for control participants. For TMR patients, the mean motion selection and motion completion times for hand-grasp patterns were 0.38 seconds (SD, 0.12) and 1.54 seconds (SD, 0.27), respectively. These patients successfully completed a mean of 96.3% (SD, 3.8) of elbow and wrist movements and 86.9% (SD, 13.9) of hand movements within 5 seconds, compared with 100% (SD, 0) and 96.7% (SD, 4.7) completed by controls. Three of the patients were able to demonstrate the use of this control system in advanced prostheses, including motorized shoulders, elbows, wrists, and hands. These results suggest that reinnervated muscles can produce sufficient EMG information for real-time control of advanced artificial arms.