Generally speaking, I am conducting researches on modeling and emulating the visual processing mechanism of human with computer algorithms, as well as practical problems involving such techniques, e.g. biometrics, intelligent Human-Computer Interaction and surveillance. Research areas that interest me include:
Computer Vision
Machine Learning and Pattern Recognition
Image and Video Processing
Computer Graphics (esp. the “grey” area between CV and CG)
Human-Computer Interaction
Artificial Intelligence
We proposed a novel boosting family classification algorithm called SODA-Boosting (where SODA stands for Second Order Discriminant
Analysis). SODA-Boosting aims at efficiently learning discriminative weak classifiers, based on linear features that can be computed in closed-form. It extends the idea of our MRC-Boosting algorithm and can serve as a generic binary classifier.
As an application, SODA-Boosting was employed in image based gender recognition. Experimental results on publicly available FERET database showed that SODA-Boosting achieved accuracy comparable to state-of-the-art approaches, and demonstrated superior performance compared to relevant boosting based algorithms. The algorithm has been integrated into our facial recognition software, capable of classifying gender in real-time from webcam feeds.
more...
We show that face recognition can be modeled as a special two-class problem, namely “target detection”, where a target class should be discriminated from the surrounding clutter class. A classification algorithm called MRC-Boosting is proposed to attack the face recognition problem based on such motivation. Unlike conventional boosting approaches widely employed in the computer vision community, MRC-Boosting is computationally efficient as at each iteration the optimal feature is computed in closed-form, requiring neither exhaustive search nor time-consuming numerical optimization. Moreover, we show that MRC-Boosting is especially efficient for learning a face recognizer. As a result this algorithm provides a promising solution to face recognition, not only effective in handling large intra-personal variations (e.g. pose and lighting), but also able to efficiently learn from a large amount of training samples.
Background estimation, i.e. automatic recovery of the
background image from a sequence of images containing
moving foreground objects, is an important module in many
applications, e.g. surveillance and video segmentation. We show that background estimation can be modeled as a low level vision problem, formulated under the energy minimization framework, and solved with
Loopy Belief Propagation. This leads to a simple yet effective approach for background estimation. The background can be robustly recovered even when the occluding foreground objects stay still for a long time. Furthermore, no motion information needs to be known or estimated for the foreground objects, implying that background can be recovered from a set of frames which are not consecutive temporally.
A novel tandem-free solution for multiparty VoIP conferences called PASS (Peer-Aware Silence Suppression) is proposed. In contrast to conventional tandem-free solutions, PASS performs silence suppression and speaker selection in a completely distributed fashion. This configuration leads to better scalability, lower bandwidth occupation and jitter buffer delay, and higher compatibility with a wide variety of network topologies. Moreover, a novel algorithm is devised for robust silence suppression. Based on machine learning techniques, this approach reliably measures true voice activity even under complex environmental noises, resulting in accurate and transparent speaker selection.
@inproceedings { Xu08CVPR_BpBg,
author = { Xun Xu and Thomas S. Huang },
title = { A Loopy Belief Propagation Approach for Robust Background Estimation },
booktitle = { 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008) },
year = { 2008 },
address = { Anchorage, Alaska },
publisher = { IEEE Computer Society },
month = { June 24-26 },
doi = { 10.1109/CVPR.2008.4587543 }
}
@inproceedings { Hu07MCAM,
author = { Yuxiao Hu and ZhenQiu Zhang and Xun Xu and Yun Fu and Thomas S. Huang },
title = { Building Large Scale 3D Face Database for Face Analysis },
booktitle = { MCAM 2007: International Workshop on Multimedia Content Analysis and Mining },
year = { 2007 },
pages = { 343--350 },
address = { Weihai, China },
month = { June }
}
@inproceedings { Liu07MIPPR,
author = { Ming Liu and Xun Xu and Thomas S. Huang },
title = { Audio-Visual Gender Recognition },
booktitle = { MIPPR'07: The Fourth International Symposium on Multispectral Image Processing and Pattern Recognition },
year = { 2007 },
address = { Wuhan, China },
month = { November }
}
@inproceedings { Xu07AMFG_Gender,
author = { Xun Xu and Thomas S. Huang },
title = { SODA-Boosting and Its Application to Gender Recognition },
booktitle = { LNCS 4778: 2007 IEEE International Workshop on Analysis and Modeling of Faces and Gestures (AMFG), in conjunction with ICCV },
year = { 2007 },
pages = { 193--204 },
address = { Rio de Janeiro, Brazil },
month = { October },
publisher = { Springer-Verlag },
doi = { 10.1007/978-3-540-75690-3 }
}
@inproceedings { Xu06ICME_PASS,
author = { Xun Xu and Li-Wei He and Dinei Flor\^{e}ncio and Yong Rui },
title = { PASS: Peer-Aware Silence Suppression for Internet Voice Conferences },
booktitle = { 2006 IEEE International Conference on Multimedia \& Expo (ICME 2006) },
year = { 2006 },
doi = { 10.1109/ICME.2006.262680 }
}
@inproceedings { Tu06ICPR_SIPPCA,
author = { Jilin Tu and Aleksandar Ivanovic and Xun Xu and Fei-Fei Li and Thomas S. Huang },
title = { Variational Shift Invariant Probabilistic PCA for Face Recognition },
booktitle = { 18th International Conference on Pattern Recognition (ICPR 2006) },
year = { 2006 }
}
@inproceedings { Xu06ICME_MeetingFR,
author = { Xun Xu and Yong Rui and Thomas S. Huang },
title = { Recognizing Faces in Recorded Meetings via MRC-Boosting },
booktitle = { 2006 IEEE International Conference on Multimedia \& Expo (ICME 2006) },
year = { 2006 },
doi = { 10.1109/ICME.2006.262860 }
}
@inproceedings { Zhang05ICME_Indecisive,
author = { Zhenqiu Zhang and Xun Xu and Thomas S. Huang },
title = { Indecisive Classifier },
booktitle = { 2005 IEEE International Conference on Multimedia \& Expo (ICME 2005) },
year = { 2005 }
}
@inproceedings { Xu05ICCV_MRCBoost,
author = { Xun Xu and Thomas S. Huang },
title = { Face Recognition with MRC-Boosting },
booktitle = { 10th IEEE International Conference on Computer Vision (ICCV 2005) },
year = { 2005 },
volume = { 2 },
pages = { 1770-1777 },
doi = { 10.1109/ICCV.2005.93 }
}
@inproceedings { Xu04FG_AMM,
author = { Xun Xu and Changshui Zhang and Thomas S. Huang },
title = { Active Morphable Model: An Efficient Method for Face Analysis },
booktitle = { Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR 2004) },
year = { 2004 },
pages = { 837-842 },
doi = { 10.1109/AFGR.2004.1301638 }
}
@article { Xu03_Flowchart,
author = { Xun Xu and Zhouchen Lin and Yantao Li and Changshui Zhang },
title = { Robust Flowchart Understanding for Pen-based User Interface },
journal = { in preparation },
year = { 2003 },
pages = { }
}
@phdthesis { Xu03thesis_AMM,
author = { Xun Xu },
title = { Active Morphable Model for the Analysis and Synthesis of Human Faces (In Chinese) },
school = { Tsinghua University },
year = { 2003 },
type = { Master's thesis }
}
@inproceedings { Xu02AROB_MultiAgentPlant,
author = { Xun Xu and Changshui Zhang },
title = { Multi-agent Developmental Model of Plants },
booktitle = { 7th Conference on Artificial Life and Robotics (AROB'02) },
year = { 2002 }
}
@phdthesis { Xu00thesis_LoginSystem,
author = { Xun Xu },
title = { A Computer Login System Based on Face and Fingerprint Recognition (In Chinese) },
school = { Tsinghua University },
year = { 2000 },
type = { Bachelor's thesis }
}
“System and Method for Shape Recognition of Hand-drawn Objects”, coinvented with Yantao Li, Zhouchen Lin and Jian Wang. (US Patent 7,324,691; European Patent Appl. No. 04019840.0; Microsoft Ref. No. 306517.05)
“Peer-Aware Ranking of Voice Streams”, coinvented with Li-wei He and Dinei A. Florêncio. Filed in Mar. 2006.