Face recognition technology has been well investigated in past decades and widely deployed in many real-world applications. However, low-resolution face recognition is still a challenging task in resource-constrained edge computing environment like the Internet of Video Things (IoVT) applications. For instance, low-resolution images are common in surveillance video streams, in which the rare information, variable angles, and light conditions create difficulties for recognition tasks. To address these problems, we optimized the correlation feature face recognition (CoFFaR) method and conducted experimental studies in two data preparation modes, symmetric and exhaustive arranging. The experimental results show that the CoFFaR method achieved an accuracy rate of over 82.56%, and the two-dimensional (2D) feature points after dimension reduction are uniformly distributed in a diagonal pattern. The analysis leads to the conclusion that the data augmentation advantage brought by the method of exhaustive arranging data preparation can effectively improve the performance, and the constraints by making the feature vector closer to its clustering center have no apparent improvement in the accuracy of the model identification.
The rapid advancement of multimedia content editing software tools has made it increasingly easy for malicious actors to manipulate real-time multimedia data streams, encompassing audio and video. Among the notorious cybercrimes, replay attacks have gained widespread prevalence, necessitating the development of more efficient authentication methods for detection. A cutting-edge authentication technique leverages Electrical Network Frequency (ENF) signals embedded within multimedia content. ENF signals offer a range of advantageous attributes, including uniqueness, unpredictability, and total randomness, rendering them highly effective for detecting replay attacks. To counter potential attackers who may seek to deceive detection systems by embedding fake ENF signals, this study harnesses the growing accessibility of deep Convolutional Neural Networks (CNNs). These CNNs are not only deployable on platforms with limited computational resources, such as Single-Board Computers (SBCs), but they also exhibit the capacity to swiftly identify interference within a signal by learning distinctive spatio-temporal patterns. In this paper, we explore applying a Computationally Efficient Deep Learning Model (CEDM) as a powerful tool for rapidly detecting potential fabrications within ENF signals originating from diverse audio sources. Our experimental study validates the effectiveness of the proposed method.
In an era characterized by the prolific generation of digital imagery through advanced artificial intelligence, the need for reliable methods to authenticate actual photographs from AI-generated ones has become paramount. The ever-increasing ubiquity of AI-generated imagery, which seamlessly blends with authentic photographs, raises concerns about misinformation and trustworthiness. Authenticating these images has taken on critical significance in various domains, including journalism, forensic science, and social media. Traditional methods of image authentication often struggle to adapt to the increasingly sophisticated nature of AI-generated content. In this context, frequency domain analysis emerges as a promising avenue due to its effectiveness in uncovering subtle discrepancies and patterns that are less apparent in the spatial domain. Delving into the imperative task of imagery authentication, this paper introduces a novel Generative Adversarial Networks (GANs) based AI-generated Imagery Authentication (GANIA) method using frequency domain analysis. By exploiting the inherent differences in frequency spectra, GANIA uncovers unique signatures that are difficult to replicate, ensuring the integrity and authenticity of visual content. By training GANs on vast datasets of real images, we create AI-generated counterparts that closely mimic the characteristics of authentic photographs. This approach enables us to construct a challenging and realistic dataset, ideal for evaluating the efficacy of frequency domain analysis techniques in image authentication. Our work not only highlights the potential of frequency domain analysis for image authentication but also underscores the importance of adopting generative AI approaches in studying this critical topic. Through this innovative fusion of AI and frequency domain analysis, we contribute to advancing image forensics and preserving trust in visual information in an AI-driven world.
The information era has gained a lot of traction due to the abundant digital media contents through technological broadcasting resources. Among the information providers, the social media platform has remained a popular platform for the widespread reach of digital content. Along with accessibility and reach, social media platforms are also a huge venue for spreading misinformation since the data is not curated by trusted authorities. With many malicious participants involved, artificially generated media or strategically altered content could potentially result in affecting the integrity of targeted organizations. Popular content generation tools like DeepFake have allowed perpetrators to create realistic media content by manipulating the targeted subject with a fake identity or actions. Media metadata like time and location-based information are altered to create a false perception of real events. In this work, we propose a Decentralized Electrical Network Frequency (ENF)-based Media Authentication (DEMA) system to verify the metadata information and the digital multimedia integrity. Leveraging the environmental ENF fingerprint captured by digital media recorders, altered media content is detected by exploiting the ENF consistency based on its time and location of recording along with its spatial consistency throughout the captured frames. A decentralized and hierarchical ENF map is created as a reference database for time and location verification. For digital media uploaded to a broadcasting service, the proposed DEMA system correlates the underlying ENF fingerprint with the stored ENF map to authenticate the media metadata. With the media metadata intact, the embedded ENF in the recording is compared with a reference ENF based on the time of recording, and a correlation-based metric is used to evaluate the media authenticity. In case of missing metadata, the frames are divided spatially to compare the ENF consistency throughout the recording.
Ever since human society entered the age of social media, every user has had a considerable amount of visual content stored online and shared in variant virtual communities. As an efficient information circulation measure, disastrous consequences are possible if the contents of images are tampered with by malicious actors. Specifically, we are witnessing the rapid development of machine learning (ML) based tools like DeepFake apps. They are capable of exploiting images on social media platforms to mimic a potential victim without their knowledge or consent. These content manipulation attacks can lead to the rapid spread of misinformation that may not only mislead friends or family members but also has the potential to cause chaos in public domains. Therefore, robust image authentication is critical to detect and filter off manipulated images. In this paper, we introduce a system that accurately AUthenticates SOcial MEdia images (AUSOME) uploaded to online platforms leveraging spectral analysis and ML. Images from DALL-E 2 are compared with genuine images from the Stanford image dataset. Discrete Fourier Transform (DFT) and Discrete Cosine Transform (DCT) are used to perform a spectral comparison. Additionally, based on the differences in their frequency response, an ML model is proposed to classify social media images as genuine or AI-generated. Using real-world scenarios, the AUSOME system is evaluated on its detection accuracy. The experimental results are encouraging and they verified the potential of the AUSOME scheme in social media image authentications.
Modern infrastructure development has led to a rise in deployed surveillance cameras to monitor remote locations and widespread infrastructures. In today’s networked surveillance environment, however, human operators are often overwhelmed with the huge amount of visual feeds, which causes poor judgment and delayed response to emergencies. This paper proposes a distributed crawler scheme (DiCrawler) for smart surveillance systems deployed on Internet of Video Things (IoVT). The IoVT camera nodes monitor continuous video input, track the object of interest while preserving privacy, and relay correlative information to targeted nodes for constant monitoring. Each IoVT node monitors the space inside its field of view (FoV) and notifies the neighboring nodes about the objects leaving the FoV and heading in their directions. A smart communication algorithm among IoVT nodes is designed to prevent network bandwidth bottlenecks and preserve computational power. The DiCrawler system can corroborate with human operators and assist with decision-making by raising alarms in case of suspicious behavior. The IoVT network is completely decentralized, using only peer-to-peer (P2P) communication. DiCrawler does not rely on a central server for any computations, preventing a potential bottleneck if hundreds of cameras were connected and constantly uploading data to a server. Each module is also in a compact form factor, making it viable to be mounted on traditional security surveillance cameras. Extensive experimental study on a proof-of-concept prototype validated the effectiveness of the DiCrawler design.
Deep neural networks (DNN) have been studied intensively in recent years, leading to many practical applications. However, there are also concerns about the security problems and vulnerabilities of DNN. Studies on adversarial network development have shown that relatively more minor perturbations can impact the DNN performance and manipulate its outcome. The impacts of adversarial perturbations have led to the development of advanced techniques for generating image-level perturbations. Once embedded in a clean image, these perturbations are not perceptible to human eyes and fool a well-trained deep learning (DL) convolutional neural network (CNN) classifier. This work introduces a new Critical-Pixel Iterative (CriPI) algorithm after a thorough study on critical pixels’ characteristics. The proposed CriPI algorithm can identify the critical pixels and generate one-pixel attack perturbations with a much higher efficiency. Compared to a one-pixel attack benchmark algorithm, the CriPI algorithm significantly reduces the time delay of the attack from seven minutes to one minute with similar success rates.
One of the major restrictions on the practical applications of unmanned aerial vehicles (UAV) is their incomplete self-sufficiency, which makes continuous operations infeasible without human oversights. The more oversight UAVs require, the less likely they are going to be commercially advantageous when compared to their alternatives. As an autonomous system, how much human interaction is needed to function is one of the best indicators evaluating the limitations and inefficiencies of the UAVs. Popular UAV related research areas, such as path planning and computer vision, have enabled substantial advances in the ability of drones to act on their own. This research is dedicated to in-flight operations, in which there is not much reported effort to tackle the problem from the aspect of the supportive infrastructure. In this paper, an Autonomous Service network infrastructure (AutoServe) is proposed. Aiming at increasing the future autonomy of UAVs, the AutoServe system includes a service-oriented landing platform and a customized communication protocol. This supportive AutoServe infrastructure will autonomize many tasks currently done manually by human operators, such as battery replacement. A proof-of-concept prototype has been built and the simulation experimental study validated the design.
Video Surveillance Systems (VSS) have become an essential infrastructural element of smart cities by increasing public safety and countering criminal activities. A VSS is normally deployed in a secure network to prevent the access from unauthorized personnel. Compared to traditional systems that continuously record video regardless of the actions in the frame, a smart VSS has the capability of capturing video data upon motion detection or object detection, and then extracts essential information and send to users. This increasing design complexity of the surveillance system, however, also introduces new security vulnerabilities. In this work, a smart, real-time frame duplication attack is investigated. We show the feasibility of forging the video streams in real-time as the camera’s surroundings change. The generated frames are compared constantly and instantly to identify changes in the pixel values that could represent motion detection or changes in light intensities outdoors. An attacker (intruder) can remotely trigger the replay of some previously duplicated video streams manually or automatically, via a special quick response (QR) code or when the face of an intruder appear in the camera field of view. A detection technique is proposed by leveraging the real-time electrical network frequency (ENF) reference database to match with the power grid frequency.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.