Pemodelan Pola Temporal Action Unit untuk Pengenalan Ekspresi Wajah Berbasis Bidirectional LSTM

Muhammad Ghozali Sulton, Sugiyanto Sugiyanto

Muhammad Ghozali Sulton, Sugiyanto Sugiyanto

Informatics

Building of Informatics, Technology and Science (BITS)

0.0 (0 ratings)

Introduction

Pemodelan pola temporal action unit untuk pengenalan ekspresi wajah berbasis bidirectional lstm. Kembangkan sistem pengenalan ekspresi wajah berbasis Action Unit (AU) & BiLSTM. Capai stabilitas temporal dan akurasi 96.61% untuk emosi. Percepat pembuatan dataset berlabel otomatis.

2 views

Abstract

This study develops a facial expression recognition system based on Facial Action Units (AU) data using a Bidirectional Long Short-Term Memory (BiLSTM) model. The dataset consists of AU data obtained from a supervisor, sourced from DCAP-SWOZ (USC Institute for Creative Technologies), a multimodal corpus containing AU values extracted from human interaction videos. A total of 188 AU files were used in this research. Initial labeling was performed using Facial Action Coding System (FACS)-based rules as pseudo-labels serving as a starting point for training the BiLSTM model. This approach was chosen because the dataset lacks inherent emotion labels, necessitating a label initialization mechanism. The BiLSTM model functions as a temporal smoother designed to reduce noise and label inconsistencies that commonly occur in frame-by-frame rule-based approaches. The trained model then performs inference on the same data to generate final labels with improved temporal stability. Evaluation was conducted by measuring model consistency against FACS rules and qualitative analysis of temporal stability in generated labels. Data were processed into 30-frame sequences with a 1-frame sliding window to effectively capture expression dynamics patterns. The BiLSTM model was trained using two hidden layers with dropout regularization. Evaluation results showed 96.61% consistency against FACS rules with high performance across all emotion classes, including anger (99.11%), disgust (97.98%), fear (94.08%), happiness (99.29%), neutral (96.42%), sadness (98.31%), and surprise (99.16%). Qualitative analysis demonstrated that the model successfully reduced frame-by-frame label fluctuations by 73% compared to pure rule-based approaches, producing more stable and realistic emotion segmentation. These results demonstrate that the combination of FACS-based labeling and the BiLSTM model can produce a temporally consistent automated labeling system capable of accelerating labeled dataset creation, although validation against human ground truth remains necessary as future research.

Review

This paper addresses a critical challenge in facial expression recognition: the creation of temporally consistent and accurately labeled datasets, particularly when relying on Action Unit (AU) data that often lacks inherent emotion labels. The authors propose an innovative approach leveraging a Bidirectional Long Short-Term Memory (BiLSTM) model to model temporal patterns of AUs for robust facial expression recognition. Their core contribution lies in developing a system that automates the labeling process, starting from FACS-based pseudo-labels, to overcome the limitations of purely rule-based, frame-by-frame annotation which often suffers from temporal instability and noise. This method is particularly relevant for accelerating the development of large-scale, high-quality datasets for training more sophisticated facial expression recognition models. The methodology employs a BiLSTM model to act as a temporal smoother, specifically designed to mitigate noise and inconsistencies arising from initial FACS-based rule pseudo-labeling. The study utilized AU data from the DCAP-SWOZ multimodal corpus, comprising 188 AU files, which were processed into 30-frame sequences with a 1-frame sliding window to capture dynamic expression patterns effectively. The BiLSTM architecture featured two hidden layers and dropout regularization, a standard yet effective configuration for temporal sequence modeling. This systematic approach, from initial pseudo-labeling to BiLSTM-driven temporal refinement, demonstrates a well-considered strategy to generate more stable and realistic emotion segmentations in the absence of direct human-annotated ground truth for the raw AU data. The evaluation results are highly promising, demonstrating 96.61% consistency against the initial FACS rules, with remarkable performance across individual emotion classes, consistently exceeding 94% accuracy. Crucially, the qualitative analysis highlights a significant achievement: a 73% reduction in frame-by-frame label fluctuations compared to pure rule-based methods, leading to substantially more stable and realistic emotion segmentation. These findings strongly support the paper's claim that combining FACS-based pseudo-labeling with a BiLSTM model can effectively create a temporally consistent automated labeling system. While the current evaluation relies on consistency with FACS rules, the authors appropriately acknowledge that validation against human ground truth is a necessary future step, which will further solidify the practical applicability and generalizability of this impressive automated annotation framework.

Full Text

You need to be logged in to view the full text and Download file of this article - Pemodelan Pola Temporal Action Unit untuk Pengenalan Ekspresi Wajah Berbasis Bidirectional LSTM from Building of Informatics, Technology and Science (BITS) .

Comments

You need to be logged in to post a comment.

Top Blogs by Rating

Favorite Blog

Pemodelan Pola Temporal Action Unit untuk Pengenalan Ekspresi Wajah Berbasis Bidirectional LSTM

Home Research Details

Muhammad Ghozali Sulton, Sugiyanto Sugiyanto