Probabilistic sequence models for image sequence processing and recognition

This PhD thesis investigates the image sequence labeling problems optical character recognition (OCR), object tracking, and automatic sign language recognition (ASLR). To address these problems we investigate which concepts and ideas can be adopted from speech recognition to these problems. For each of these tasks we propose an approach that is centered around the approaches known from speech recognition and adapted to the problem at hand. In particular, we describe our hidden Markov model (HMM) based image sequence recognition system which has been adopted from a large vocabulary continuous s... Mehr ...

Verfasser: Dreuw, Philippe
Dokumenttyp: doctoralThesis
Erscheinungsdatum: 2012
Verlag/Hrsg.: Publikationsserver der RWTH Aachen University
Schlagwörter: info:eu-repo/classification/ddc/004 / Hidden-Markov-Modell / Hauptkomponentenanalyse / Optische Zeichenerkennung / Mehrschichten-Perzeptron / Objektverfolgung / Viterbi-Algorithmus / American sign language / Nederlandse Gebarentaal / Deutsche Gebärdensprache / Gebärdensprache / Informatik / Irish Sign Language / object tracking / gesture recognition / sign language recognition / handwriting recognition / optical character recognition
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27146059
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://publications.rwth-aachen.de/record/82808

This PhD thesis investigates the image sequence labeling problems optical character recognition (OCR), object tracking, and automatic sign language recognition (ASLR). To address these problems we investigate which concepts and ideas can be adopted from speech recognition to these problems. For each of these tasks we propose an approach that is centered around the approaches known from speech recognition and adapted to the problem at hand. In particular, we describe our hidden Markov model (HMM) based image sequence recognition system which has been adopted from a large vocabulary continuous speech recognition (LVCSR) framework and extended for tasks. For OCR, we present our RWTH Aachen University Optical Character Recognition (RWTH OCR) system, which has been developed within the scope of this thesis work. We analyze simple appearance-based features in combination with complex training algorithms. Detailed discussions about discriminative features, discriminative training, and a novel discriminative confidence-based unsupervised adaption approach are presented. In automatic sign language recognition (ASLR), we adapt the RWTH Aachen University Speech Recognition (RWTH ASR) framework to account for multiple modalities important in sign language communication, e.g. hand configuration, place of articulation, hand movement, and hand orientation. Additionally, non manual components like facial expression and body posture are analyzed. Most sign language relevant features require a robust tracking method. We propose a multi purpose model-free object tracking framework which is based on dynamic programming (DP), and which is applied to hand and head tracking tasks in automatic sign language recognition (ASLR). In particular, a context-dependent tracking decision optimization over time allows to robustly track occluded objects. The algorithm is inspired by the time alignment algorithm in speech recognition, which guarantees to find the optimal path w.r.t. a given criterion and prevents taking possibly wrong local ...