Scale-invariant feature transform (SIFT) is a highly successful and robust visual object tracking method that is frequently used in the field of machine vision today. It shows favorable performance managing changes in a target’s appearance. However, object tracking schemes based on feature points are plagued by target drift during the tracking process, especially in a complicated background scenario, because a wealth of feature points from background are treated as foreground during updates to the target’s feature points. This paper proposes a tracking scheme called Combing Foreground/Background SIFT Feature Points (FBSIFT) which dynamically maintains sets of both foreground and background features during on-line appearance learning. A novel, adaptive appearance model learning strategy is proposed here to achieve accurate results even in complicated background scenarios. By dividing all the feature points into three sets and assigning a score to each SIFT feature point, an optimal feature point set for the target, which denotes the target appearance model is identified even when the background is complicated. Further, the target is identified even when the object is occluded. Experimental results demonstrate the effectiveness of the proposed scheme in terms of its effective management of tracking drifts and failure.