Visual Tracking via Joint Sparse Representation

Visual Tracking via Joint Sparse Representation

Given a bounding box defining the object of interest (target) in the first frame of a video sequence, the goal of a tracker, is to determine the object’s bounding box in subsequent frames. In contrast to specific  trackers, where the object model is learned off-line,  general tracking is more challenging since the object is previously unknown and needs to be learned throughout the video sequence.  Primary challenges encountered in visual tracking are target appearance change and occlusion, while other challenges arise from variation in illumination, scale, and camera motion.

Sparse representation has recently shown appealing results in various computer vision applications.  Generally, a candidate is represented using a linear combination of a few elements (atoms) from a dictionary composed of a number of previously found target images. The coefficients of this representation are used to find the best candidate. Apart from the ability to handle illumination and mild pose change, these trackers attempt to tackle occlusion.

This paper presents a robust tracking approach to handle challenges such as occlusion and appearance change. The target is partitioned into a number of patches. The appearance of each patch is modeled using a dictionary composed of corresponding target patches in previous frames. In each frame, the target is found among a set of candidates generated by a particle filter, via a likelihood measure that is shown to be proportional to the sum of patch-reconstruction errors of each candidate. Since the target’s appearance often changes slowly in a video sequence, it is assumed that the target in the current frame and the best candidates of a small number of previous frames, belong to a common subspace. This is imposed using joint sparse representation to enforce the target and previous best candidates to have a common sparsity pattern. Moreover, an occlusion detection scheme is proposed that uses patch-reconstruction errors and a prior probability of occlusion, extracted from an adaptive Markov chain, to calculate the probability of occlusion per patch. In each frame, occluded patches are excluded when updating the dictionary. Extensive experimental results on several challenging sequences shows that the proposed method outperforms state-of-the-art trackers.

Related Publication:

  • Zarezade A., Rabiee H. R., Soltani-Farani A., and Khajenezhad A., “Patchwise Joint Sparse Tracking with Occlusion Detection”, IEEE Transactions on Image Processing, 2014 (Accepted)

Project code:

  • This code is written for MATLAB and contains routines for several visual tracking methods used in the above publication. Please cite the above work if you use this software and contact first author in case of any problems.

People involved:

  • Ali Zarezade, Hamid. R. Rabiee, Ali Soltani-Farani, Ahmad Khajenezhad


Sparse Signal Processing Group