Single-image super-resolution (SISR), which maps a low-resolution observation to a high-resolution image, has been extensively utilized in various computer vision applications. With the advent of convolutional neural networks (CNNs), numerous algorithms have emerged that achieve state-of-the-art results. However, the main drawback of CNN is the negligence in the interrelationship between the RGB color channel. This negligence further reduces crucial structural information of color and provides a non-optimal representation of color images. Furthermore, most of these CNN-based methods contain millions of parameters and layers, limiting the practical applications. To overcome these drawbacks, an endto- end trainable single image super-resolution method – Quaternion-based Image Super-Resolution network (QSRNet) that takes advantage of the quaternion theory is proposed in this paper. QSRNet aims at maintaining the local and global interrelationship between the channels and produces high-resolution images with approximately 4x fewer parameters when compared to standard CNNs. Extensive computer experimentations were conducted on publicly available benchmarking thermal datasets, including DIV2K, Flickr2K, Set5, Set14, BSD100, Urban100, and UEC100, to demonstrate the effectiveness of the proposed QSRNet compared to traditional CNNs.
There are more than 400,000 new cases of kidney cancer each year, and surgery is its most common treatment. Accurate segmentation and characterization of kidneys and kidney tumors is an important step in quantifying the tumor's morphological details to monitor the progression of the disease and improve treatment planning. Segmentation of kidney tumors in CT images is a challenging task due to the low contrast, irregular motion, diverse shapes, and sizes. Furthermore, manual delineation techniques are extremely time-consuming and are prone to errors due to the variability between different specialists. The literature provides the application of 3D Convolutional Neural Networks (CNNs) for the segmentation of kidneys and tumors. While effective, 3D CNNs are computationally expensive. Our work proposes the applications of a novel 2D CNN architecture to segment kidneys and tumors from CT images. The proposed architecture uses features from enhanced images to improve the segmentation performance. Quantitative and qualitative analysis of the proposed model on the KiTS19 dataset shows the improvement against recent state-of-the-art architectures.
The dynamics of gaze coordination in natural contexts are affected by various properties of the task, the agent, the environment, and their interaction. Artificial Intelligence (AI) lays the foundation for detection, classification, segmentation, and scene analysis. Much of AI in everyday use is dedicated to predicting people's behavior. However, a purely data-driven approach cannot solve development problems alone. Therefore, it is imperative that decision-makers also consider another AI approach—causal AI, which can help identify the precise relationships of cause and effect. This article presents a novel Gaze Feature Transverse Network (Gaze-FTNet) that generates close-to-human gaze attention. The proposed end-to-end trainable approach leverages a feature transverse network (FTNet) to model long-term dependencies for optimal saliency map prediction. Moreover, several modern backbone architectures are explored, tested, and analyzed. Synthetically predicting human attention from monocular RGB images will benefit several domains, particularly humanvehicle interaction, autonomous driving, and augmented reality.
Depth estimation is an essential component in understanding the 3D geometry of a scene. In comparison to traditional depth estimation methods such as structure from motion and stereo vision matching, determining depth relation using a single camera is challenging. The recent advancements in convolutional neural networks have accelerated the research in monocular depth estimation. However, most technologies infer depth maps using lower resolution images due to network capacity and complexity issues. Another challenge in depth estimation is ambiguous and sparse depth maps. These issues are caused due to labeling errors, hardware faults, or occlusions. This paper presents a novel end-to-end trainable convolutional neural network architecture – depth transverse transformer network (DTTNet). The proposed network is designed and optimized to perform monocular depth estimation. This network aims at exploring the multi-resolution representation to perform pixel-wise depth estimation more accurately. In order to improve the accuracy of depth estimation, different kinds of ad hoc networks are proposed subsequently. Extensive computer simulations on NYU Depth V2 and SUN RGB-D dataset demonstrate the effectiveness of the proposed DTTNet against state-of-the-art methods. DTTNet can potentially optimize depth perception in intelligent systems such as automated driving and video surveillance applications, computational photography, and augmented reality. The source code is available at https://github.com/shreyaskamathkm/DTTNet
Facial emotion recognition technology finds numerous real-life applications in areas of virtual learning, cognitive psychology analysis, avatar animation, neuromarketing, human machine interactions, and entertainment systems. Most state-of-the-art techniques focus primarily on visible spectrum information for emotion recognition. This becomes very arduous as emotions of individuals vary significantly. Moreover, visible images are susceptible to variation in illumination. Low lighting, variation in poses, aging, and disguise have a substantial impact on the appearance of images and textural information. Even though great advances have been made in the field, facial emotion recognition using existing techniques is often not satisfactory when compared to human performance. To overcome these shortcomings, thermal images are preferred to visible images. Thermal images a) are less sensitive to lighting conditions, b) have consistent thermal signatures, and c) have a temperature distribution formed by the face vein branches. This paper proposes a robust emotion recognition system using thermal images- TERNet. To accomplish this, customized convolutional neural network(CNNs) is employed, which possess excellent generalization capabilities. The architecture adopts features obtained via transfer learning from the VGG-Face CNN model, which is further fine-tuned with the thermal expression face data from the TUFTS face database. Computer simulations demonstrate an accuracy of 96.2% when compared to the state-of-the-art models.
Security surveillance are low-cost, ubiquitous systems, which are employed in smart cities around the world for threat monitoring and assessment. Manual observation, monitoring and tracking their population, detection, and reporting abnormal events in crowded places can be very challenging. Smart cities favor the use of sophisticated security systems, which can exceed human errors. Moreover, multi-view near-infrared surveillance systems pose challenges such as poor image quality, color discontinuity, occlusion, and image blur. Also, the performance of a recognition system depends on the specifications of the camera. All these distortions cause interference in feature extraction process in face or object classification systems. In this article, an intelligent multi-view image mosaicking algorithm, which combines near-infrared images captured from dozens of cameras/sensors is introduced. The presented system a) preserves facial features, b) avoids vertical banding (exposure variation), and c) solves color discontinuity aiding for face detection systems. The performance of this technique is tested against its ground truth, both subjectively and quantitatively. The quantitative analysis is performed using measures such as SSIM, MS-SSIM, AME, LogAMEE, and TDMEC.
Biometric evidence plays a crucial role in criminal scene analysis. Forensic prints can be extracted from any solid surface such as firearms, doorknobs, carpets and mugs. Prints such as fingerprints, palm prints, footprints and lip-prints can be classified into patent, latent, and three-dimensional plastic prints. Traditionally, law enforcement officers capture these forensic traits using an electronic device or extract them manually, and save the data electronically using special scanners. The reliability and accuracy of the method depends on the ability of the officer or the electronic device to extract and analyze the data. Furthermore, the 2-D acquisition and processing system is laborious and cumbersome. This can lead to the increase in false positive and true negative rates in print matching. In this paper, a method and system to extract forensic prints from any surface, irrespective of its shape, is presented. First, a suitable 3-D camera is used to capture images of the forensic print, and then the 3-D image is processed and unwrapped to obtain 2-D equivalent biometric prints. Computer simulations demonstrate the effectiveness of using 3-D technology for biometric matching of fingerprints, palm prints, and lip-prints. This system can be further extended to other biometric and non-biometric modalities.
In the field of vision-based systems for object detection and classification, thresholding is a key pre-processing step. Thresholding is a well-known technique for image segmentation. Segmentation of medical images, such as Computed Axial Tomography (CAT), Magnetic Resonance Imaging (MRI), X-Ray, Phase Contrast Microscopy, and Histological images, present problems like high variability in terms of the human anatomy and variation in modalities. Recent advances made in computer-aided diagnosis of histological images help facilitate detection and classification of diseases. Since most pathology diagnosis depends on the expertise and ability of the pathologist, there is clearly a need for an automated assessment system. Histological images are stained to a specific color to differentiate each component in the tissue. Segmentation and analysis of such images is problematic, as they present high variability in terms of color and cell clusters. This paper presents an adaptive thresholding technique that aims at segmenting cell structures from Haematoxylin and Eosin stained images. The thresholded result can further be used by pathologists to perform effective diagnosis. The effectiveness of the proposed method is analyzed by visually comparing the results to the state of art thresholding methods such as Otsu, Niblack, Sauvola, Bernsen, and Wolf. Computer simulations demonstrate the efficiency of the proposed method in segmenting critical information.
Biometrics, particularly palm print authentication has been a stimulating research area due to its abundance of features. Stable features and effective matching are the most crucial steps for an authentication system. In conventional palm print authentication systems, matching is based on flexion creases, friction ridges, and minutiae points. Currently, contactless palm print imaging is an emerging technology. However, they tend to involve fluctuations in the image quality and texture loss due to factors such as varying illumination conditions, occlusions, noise, pose, and ghosting. These variations decrease the performance of the authentication systems. Furthermore, real-time palm print authentication in large databases continue to be a challenging task. In order to effectively solve these problems, features which are invariant to these anomalies are required. This paper proposes a robust palm print matching framework by making a comparative study of different local geometric features such as Difference-of-Gaussian, Hessian, Hessian-Laplace, Harris-Laplace, and Multiscale Harris for feature detection. These detectors are coupled with Scale Invariant Feature Transformation (SIFT) descriptor to describe the identified features. Additionally, a two-stage refinement process is carried out to obtain the best stable matches. Computer simulations demonstrate that the accuracy of the system has increased effectively with an EER of 0.86% when Harris-Laplace detector is used on IITD database.
KEYWORDS: Biometrics, 3D modeling, Computer simulations, 3D acquisition, Image quality, Image enhancement, 3D image processing, Scanners, Data modeling, Databases
Despite the advancements of fingerprint recognition in 2-D and 3-D domain, authenticating deformed/post-mortem fingerprints continue to be an important challenge. Prior cleansing and reconditioning of the deceased finger is required before acquisition of the fingerprint. The victim’s finger needs to be precisely and carefully operated by a medium to record the fingerprint impression. This process may damage the structure of the finger, which subsequently leads to higher false rejection rates. This paper proposes a non-invasive method to perform 3-D deformed/post-mortem finger modeling, which produces a 2-D rolled equivalent fingerprint for automated verification. The presented novel modeling method involves masking, filtering, and unrolling. Computer simulations were conducted on finger models with different depth variations obtained from Flashscan3D LLC. Results illustrate that the modeling scheme provides a viable 2-D fingerprint of deformed models for automated verification. The quality and adaptability of the obtained unrolled 2-D fingerprints were analyzed using NIST fingerprint software. Eventually, the presented method could be extended to other biometric traits such as palm, foot, tongue etc. for security and administrative applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.