3Facial Landmark Detection with Spatio-temporal Modeling
Romain BELMONTE1, Pierre TIRILLY1, Ioan Marius BILASCO1, Nacim IHADDADENE2 and Chaabane DJERABA1
1University of Lille, France
2Junia ISEN, Lille, France
Despite considerable progress in recent years, the performance of facial landmark detection under uncontrolled conditions is still not fully satisfactory (see section 2.3), and even today this problem continues to be studied largely from still images. Yet, with the ubiquity of video sensors, the vast majority of applications rely on videos. Current approaches, when applied to videos, usually track landmarks by detecting them and are therefore not able to leverage the temporal dimension (see section 1.2). Recent work has proved that taking into account video consistency help to deal with the variability in facial appearance and ambient environment encountered under uncontrolled conditions. It generally involves a CNN coupled to an RNN, which provides only limited temporal connectivity on feature maps with a high level of abstraction. Such architectures can model global motion (e.g. head motion) but not as easily local motion like the movements of the eyes or the lips, which are important to detect facial landmarks accurately.
Video analysis has been studied for a long time to tackle a variety of problems including human behaviour understanding. The interest for video has been growing today due to the explosion in the number of videos shared on the Internet, media, surveillance, ...
Get Face Analysis Under Uncontrolled Conditions now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.