Objects as Context for Part Detection

We present a semantic part detection approach that effectively leverages object information. We use the object appearance and its class as indicators of what parts to expect. We also model the expected relative location of parts inside the objects based on their appearance. We achieve this with a new network module, called OffsetNet, that efficiently predicts a variable number of part locations within a given object. Our model incorporates all these cues to detect parts in the context of their objects. This leads to significantly higher performance for the challenging task of part detection compared to using part appearance alone (+5 mAP on the PASCAL-Part dataset). We also compare to other part detection methods on both PASCAL-Part and CUB200-2011 datasets.


Do Semantic Parts Emerge in Convolutional Neural Networks?

Semantic object parts can be useful for several visual recognition tasks. Lately, these tasks have been addressed using Convolutional Neural Networks (CNN), achieving outstanding results. In this work we study whether CNNs learn semantic parts in their internal representation. We investigate the responses of convolutional filters and try to associate their stimuli with semantic parts. We perform two extensive quantitative analysis. First, we use ground-truth part bounding-boxes from the PASCAL-Part dataset to determine how many of those semantic parts emerge in the CNN. We explore this emergence for different layers, network depths, and supervision levels. Second, we collect human judgements in order to study what fraction of all filters systematically fire on any semantic part, even if not annotated in PASCAL-Part. Moreover, we explore several connections between discriminative power and semantics. We find out which are the most discriminative filters for object recognition, and analyze whether they respond to semantic parts or to other image patches. We also investigate the other direction: we determine which semantic parts are the most discriminative and whether they correspond to those parts emerging in the network. This enables to gain an even deeper understanding of the role of semantic parts in the network.


Active Search Strategy for Efficient Object Class Detection

Object class detectors typically apply a window classifier to all the windows in a large set, either in a sliding window manner or using object proposals. In this work, we develop an active search strategy that sequentially chooses the next window to evaluate based on all the information gathered before. This results in a substantial reduction in the number of classifier evaluations and in a more elegant approach in general. Our search strategy is guided by two forces. First, we exploit context as the statistical relation between the appearance of a window and its location relative to the object, as observed in the training set. This enables to jump across distant regions in the image (e.g. observing a sky region suggests that cars might be far below) and is done efficiently in a Random Forest framework. Second, we exploit the score of the classifier to attract the search to promising areas surrounding a highly scored window, and to keep away from areas near low scored ones. In experiments on the challenging SUN 2012 dataset, our method matches the detection accuracy of evaluating all windows independently, while evaluating 9x fewer windows. It even outperforms it when evaluating 3x fewer windows.


[MSc Thesis]
Development of a Non-contact Heart Rate Measurement System (MSc thesis)

Given the importance of the physiological signal known as heart rate, a contactless heart rate measurement system may be extremely beneficial for situations in which traditional methods are not convenient. These usually involve some type of contact with the body, creating discomfort for certain patients, such as burn victims or new- borns. A non-contact heart rate measurement system has been implemented and tested for this MSc dissertation, based on recent work that uses images from a webcam. We have studied how to improve the weakest points of previous approaches, such as the subject’s face tracking or the signal pre-processing. Taking inspiration from existing ideas, as well as implementing our original ones, the system has been adapted to achieve higher performance. For example, a novel combination of face features has reported improvement in the face tracking block, while signal detrending as a pre-processing step has increased the overall accuracy of the system. We have also carried out experiments to test the developed system, producing satisfactory results when no large motions are present, hence demonstrating the feasibility of the system under restricted conditions. Supervised by Iain Murray.


[Final Degree Project (Catalan)]

[Poster (Catalan)]

Android Mobile Device to aid Visually Impaired People (Final Degree Project).

A significant percentage of the human population suffer from impairments in their capacity to distinguish or even see colours. For them, everyday tasks like navigating through a train or metro network map becomes demanding. We present a novel technique for extracting colour information from everyday natural stimuli and presenting it to visually impaired users as pleasant, non-invasive sound. This technique was implemented using an Android mobile device. In this implementation, colour information is extracted from the input image and categorised according to how human observers segment the colour space. This information is subsequently converted into sound and sent to the user via speakers or headphones. The resulting app is configurable and accessible.