Deep Fake Image Detection Using Vision Transformer with Random Oversampling Technique

Dipo Paudro Tirto Prakoso, Sugiyanto Sugiyanto

Dipo Paudro Tirto Prakoso, Sugiyanto Sugiyanto

Informatics

Building of Informatics, Technology and Science (BITS)

0.0 (0 ratings)

Introduction

Deep fake image detection using vision transformer with random oversampling technique. Detect deepfake images accurately using a Vision Transformer (ViT) with Random Oversampling. Achieves 94.46% accuracy, addressing data imbalance and enhancing digital media security.

2 views

Abstract

Recent developments in deep learning have facilitated the generation of visually convincing deepfake images, creating serious concerns for the reliability and security of digital media content. The primary challenge lies in detecting these sophisticated manipulations while handling imbalanced datasets, a common issue in deepfake detection research. This research focuses on designing a robust deepfake image classification model based on the Vision Transformer (ViT) architecture to differentiate between authentic and manipulated images. The main objectives are to: (1) adapt and fine-tune a pre-trained Vision Transformer for binary classification, (2) evaluate the effectiveness of Random Oversampling in addressing class imbalance while preventing data leakage, and (3) assess model performance using comprehensive metrics. Methods: A pre-trained Vision Transformer model (Deep-Fake-Detector-v2-Model) was adapted and fine-tuned using a dataset consisting of 190,335 images. To overcome the issue of class imbalance, a Random Oversampling strategy was applied exclusively to the training set after dataset splitting to prevent data leakage. The dataset was divided into training and testing subsets using an 80:20 ratio. During the training phase, data augmentation techniques such as image rotation, sharpness variation, and pixel normalization were employed. The model was trained for four epochs with a learning rate of 1×10⁻⁶ and a batch size of 32. Results: Experimental evaluation demonstrates that the proposed model achieves a classification accuracy of 94.46% on the test dataset. The model demonstrates high precision of 97.56% for fake images and 91.74% for real images, with corresponding recall rates of 91.21% and 97.72% respectively. The F1-score reaches 94.46% for both classes, indicating balanced performance. Novelty: This research presents a novel application of Vision Transformer architecture for deepfake detection, combining efficient transfer learning with strategic oversampling to handle imbalanced datasets while preventing data leakage. The study demonstrates that ViT-based models can effectively capture subtle manipulation artifacts in deepfake images, achieving superior performance compared to traditional convolutional neural network approaches.

Review

This paper presents a timely and relevant contribution to the critical field of deepfake image detection, addressing the growing concerns about the reliability of digital media. The authors propose an innovative approach that leverages the Vision Transformer (ViT) architecture, a state-of-the-art deep learning model for image processing, combined with a strategically implemented Random Oversampling technique. The core objective is to create a robust classification model capable of distinguishing between authentic and manipulated images, while also effectively managing the common challenge of imbalanced datasets. The abstract reports impressive performance metrics, indicating a highly effective solution to a significant real-world problem. The methodological rigor is a significant strength of this research. The fine-tuning of a pre-trained Vision Transformer is a sound choice, allowing the model to effectively learn and identify the subtle manipulation artifacts often present in deepfake images. A crucial aspect highlighted is the careful application of Random Oversampling exclusively to the training set *after* the initial data split, a correct procedure that prevents data leakage and ensures a more reliable and unbiased evaluation of the model's generalization capabilities. Furthermore, the use of comprehensive evaluation metrics, including accuracy, precision, recall, and F1-score for both classes, provides a thorough and balanced assessment of the model's performance, particularly important when dealing with potential class imbalance. While the presented results are promising, the abstract could benefit from additional details to fully contextualize the findings. Specifically, providing more information regarding the composition and diversity of the 190,335-image dataset (e.g., the types of deepfake generation methods represented, source datasets) would enhance understanding of the model's generalizability across various deepfake forms. Although the paper claims superior performance compared to traditional convolutional neural networks, the abstract lacks explicit comparative results or a baseline to fully substantiate this claim. Future work could further strengthen the research by including an ablation study to quantify the individual contributions of the Vision Transformer architecture and the oversampling strategy, and by clarifying the source or nature of the "Deep-Fake-Detector-v2-Model" used for pre-training.

Full Text

You need to be logged in to view the full text and Download file of this article - Deep Fake Image Detection Using Vision Transformer with Random Oversampling Technique from Building of Informatics, Technology and Science (BITS) .

Comments

You need to be logged in to post a comment.

Top Blogs by Rating

Favorite Blog

Deep Fake Image Detection Using Vision Transformer with Random Oversampling Technique

Home Research Details

Dipo Paudro Tirto Prakoso, Sugiyanto Sugiyanto