PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
This PDF file contains the front matter associated with SPIE Proceedings Volume 12100, including the Title Page, Copyright information, Table of Contents, and Conference Committee listings.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Satellite images are widely available to the public. These satellite images are used in various fields including natural disaster analysis, meteorology and agriculture. As with any type of images, satellite images can be altered using image manipulation tools. A common manipulation is splicing, i.e., pasting on top of an image a region coming from a different source image. Most manipulation detection methods designed for images captured by “consumer cameras" tend to fail when used with satellite images. In this paper we propose a machine learning approach, Sat U-Net, to fuse the results of two exiting forensic splicing localization methods to increase their overall accuracy and robustness. Sat U-Net is a U-Net based architecture exploiting several Transformers to enhance the performance. Sat U-Net fuses the outputs of two unsupervised splicing detection methods, Gated PixelCNN Ensemble and Vision Transformer, to produce a heatmap highlighting the manipulated image region. We show that our fusion approach trained on images from one satellite can be lightly retrained on few images from another satellite to detect spliced regions. We compare our approach to well-known splicing detection methods (i.e., Noiseprint) and segmentation techniques (i.e., U-Net and Nested Attention U-Net). We conducted our experiments on two large datasets: one dataset contains images from Sentinel 2 satellites and the other one contains images from Worldview 3 satellite. Our experiments show that our proposed fusion method performs well when compared to other techniques in localizing spliced areas using Jaccard Index and Dice Score as metrics on both datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Depth estimation is an essential component in understanding the 3D geometry of a scene. In comparison to traditional depth estimation methods such as structure from motion and stereo vision matching, determining depth relation using a single camera is challenging. The recent advancements in convolutional neural networks have accelerated the research in monocular depth estimation. However, most technologies infer depth maps using lower resolution images due to network capacity and complexity issues. Another challenge in depth estimation is ambiguous and sparse depth maps. These issues are caused due to labeling errors, hardware faults, or occlusions. This paper presents a novel end-to-end trainable convolutional neural network architecture – depth transverse transformer network (DTTNet). The proposed network is designed and optimized to perform monocular depth estimation. This network aims at exploring the multi-resolution representation to perform pixel-wise depth estimation more accurately. In order to improve the accuracy of depth estimation, different kinds of ad hoc networks are proposed subsequently. Extensive computer simulations on NYU Depth V2 and SUN RGB-D dataset demonstrate the effectiveness of the proposed DTTNet against state-of-the-art methods. DTTNet can potentially optimize depth perception in intelligent systems such as automated driving and video surveillance applications, computational photography, and augmented reality. The source code is available at https://github.com/shreyaskamathkm/DTTNet
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the emergence of advanced 2D and 3D sensors such as high-resolution visible cameras and less expensive lidar sensors, there is a need for a fusion of information extracted from senor modalities for accurate object detection, recognition, and tracking. To train a system with data captured by multiple sensors the regions of interest in the data must be accurately aligned. A necessary step in this process is a fine, pixel-level registration between multiple modalities. We propose a robust multimodal data registration strategy for automatically registering the visible and lidar data captured by sensors embedded in aerial vehicles. The coarse registration of the data is performed by utilizing the metadata, such as timestamps, GPS, and IMU information, provided by the data acquisition systems. The challenge is these modalities contain very different sets of information and are not able to be aligned using classical methods. Our proposed fine registration mechanism employs deep-learning methodologies for feature extraction of data in each modality. For our experiments, we use a 3D geopositioned aerial lidar dataset along with the visible data (coarsely registered) and extracted SIFT-like features from both of the data streams. These SIFT features are generated by appropriately trained deep-learning algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Username, password, and biometrics are three-factor authentication for cybersecurity enhancement. Adding keystroke and mouse biometrics to authentication will definitely improves the cybersecurity. Keystroke dynamics refers to the process of measuring and assessing human’s typing rhythm on digital devices. Keystroke timing information such as digraph, dwell time and flight time are used in our experimental datasets. Mouse dynamics records mouse motion (speed), left-, right-, or double-clicking timing information. Our own dataset includes both types of dynamics from same group of subjects. We develop recurrent neural network (RNN) models and support vector machine (SVM) models to represent user’s biometrics. Keystroke and mouse dynamics can be used as features fed to the models separately for user verification or identification. Feature fusion is applied to improve the accuracy. Our results show the RNN method is better than traditional methods like SVM, and fusion can further improve the performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The border irregularity of lesions or tumours is an important sign (or feature) contributing to the prediction of the tumor malignancy. This paper is concerned with developing automatic computer vision methods for assessing and recognizing thyroid nodule border irregularity from ultrasound images. Unlike many existing schemes, our methods rely on a small set of points on the nodule border marked manually by clinicians. To mitigate the absence of a fully segmented lesion boundary, we first apply the cubic-spline interpolation of the region of interest (ROI) points to approximate the lesion border and then select equal numbers of points from the approximated border using equal angular distances. We developed two complementary approaches to investigate the global (big indentations and protrusions) and local (small zigzag) irregularity features of the nodule. The first approach includes two Euclidian distances-based methods and a method inspired by Fractal Dimensions (FD). The distances-based methods facilitate the use of the interpolated border and their radial distance functions measured from ROI points to a reference point (centroid) or reference shape (Convex hull), while the FD inspired method uses interpolated border and a fitted ellipse perimeter ratio to calculate an irregularity index. The second approach facilitates the texture analysis within the constructed ribbons around the border line of different widths using feature vector of uniform local binary pattern (ULBP). We evaluate and compare the performance of our methods from the two approaches by using two datasets consisting of 395 and 100 ultrasound images of thyroid nodules collected from two hospitals and labelled by experienced radiologists respectively. The first is used as training and internal testing set, while the second is used as an external testing set. We shall show the viability of our methods attaining accuracy rates between 70% and 90%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ultrasound scan (US) imagery is an important tool for radiologists to make a fast and reliable diagnosis decision about breast lesion status (benign or malignant). Accurate and automatic segmentation of breast lesion is critical for annotating the lesion characteristics such as margin smoothness and regularity in support of the diagnosis decision. Fully convolutional network (FCN) is one of the commonly used deep learning neural network methods for semantic segmentation. This paper is concerned with effective adaptation of the FCN solutions for segmenting breast lesions from 2D ultrasound images. The paper aims to first evaluate the existing FCN solution for solving the problem at hand and compare its performance with another popular method using U-Net. The paper then highlights one key issue with the FCN, i.e. false positive pixels near the boundary of a lesion and false positive pixels forming false lesions. The paper then investigates several methods in reducing such false positive pixels, including the use of data augmentation in training the classification model and use of loss functions in training the models. Experimental results using several data sets collected from various sources show that our adapted FCN method outperforms U-Net-based solutions in general and the false positive reduction methods we attempted have reduced the false positive pixels in both regions close to lesion boundary and separate from true lesion regions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The difficulty of obtaining sufficient number of appropriately labelled samples is a major obstacle to learning class discriminating features by Machine Learning (ML) algorithms for tumor diagnostics from Ultrasound (US) images. This is often mitigated by sample augmentation, whereby new samples are generated from existing samples by rotation and flipping operations, Singular Value Decomposition (SVD) or generating synthetic image by Generative Adversarial Networks (GANs). The first approach does not generate new genuine samples, SVD generates images may not be easy to recognize as US tumor scans, and while GANs generate images are visually convincing their use for diagnostics may lead to overfitting and subject to adversarial attacks. We propose an innovative sample augmentation approach that utilizes our recently developed Tumor Margin Appending (TMA) scheme. The TMA scheme constructs the Convex Hull (CH) of the tumor region using a small set of radiologist marked tumor boundary points and crops the image at different radial expansion ratios of the CH onto surrounding tissue. Various ML algorithms, handcrafted features and Convolutional Neural Network (CNN), trained with TMA images at different ratios achieved acceptable diagnostic accuracies. In this paper, our sample augmentation scheme expands the ML training datasets by including TMA samples at several expansion ratios. Results of experiments on training CNN tumor diagnostic schemes for breast tumors yield improved classification performance with additional benefits, including robustness against different inadvertently practiced cropping at different hospitals, serves as a regularizer to reduce model overfitting when tested on unseen datasets obtained using unknown tumor segmentation and cropping procedure.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Drug discovery and development pipelines are complex, long, and depend on several factors, including FDA trials. Recently artificial intelligence has moved from largely theoretical studies to real-world applications. The pharmaceutical industry currently faces challenges sustaining its drug development programs because of increased R&D costs and reduced efficiency. There is a critical need for time and cost-efficient strategies to analyze and interpret these data to advance human drug development prediction. Also, to predict if new drugs will pass FDA trials or not. In this study, we attempt to accomplish four tasks (1) create a reliable dependent variable to categorize drugs with minimal noise, (2) link this dependent variable to predictor variables, (3) utilize a boosted tree model with the principal component method to develop an algorithm to predict FDA trial outcomes, and (4) to develop a design matrix of regressor variables for 3500 approved and investigational drugs built with DrugBank 5.1.8 Drug Targets and Drug Categories data, as well as ATC codes from both DrugBank and ChEMBL databases. Additionally, intensive computer simulations (1) show a 91% prediction success rate over a wide range of drug categories, (2) provide new insights into predicting the success of drug development, and (3) present data that can save time and resources and help decision-making for the benefit of companies future investigation, potentially to help clinical trials and investment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The US Department of Defense has a need to successfully navigate in operational regions where GPS is degraded or denied. When GPS is denied, navigation of aerial platforms, including manned and unmanned aerial vehicles (UAVs), for Intelligence, Surveillance, and Reconnaissance (ISR) missions, targeting missions, or autonomous cargo delivery missions, becomes compromised. In the absence of GPS, navigating from pure inertial solutions leads to rapidly growing position errors due to drift in the inertial measurement unit. Vision Aided Navigation (VAN) approaches can aid the inertial solution to reduce navigation error, but require salient and distinct scene content for image alignment. In this paper, we present an approach to optimal path planning for VAN over operational ground regions that minimizes navigation position error. The approach uses automated pre-mission visual fiducial discovery to identify regions in imagery of the fly-over area that contain unique, salient, discriminative, and stable feature content. The discovered visual fiducial regions are used to form a map of probabilities of successful VAN at each point of the gridded fly-over region. An optimal path planning algorithm uses the probability map to determine the path over the fly-over region that maximizes navigability and minimizes VAN positioning error. Constraints, such as no-fly zones and path length constraints, are incorporated into the formulation to generate a constrained optimization problem. We present the mathematical formulation of the constrained path planning optimization problem and generate numerical results demonstrating performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As traditional RGB cameras cannot perform well under weak light in the darkness and poor weather conditions, thermal cameras have become an essential component of edge systems. This paper proposes a lightweight, faster binarized R-CNN network (a state-of-the-art instance segmentation model), called BiThermalNet, for thermal object detection with high detection capabilities and lower memory usage. It designs a new Region Proposal Network(RPN) structure with a binary neural network (BNN) to lower model size by 16%, having higher accuracy performance. BiThermalNet adds novel-designed residual gates to maximum information entropy and offers channel-wise weight and bias to reduce errors from binarization. The extensive experiments on different thermal datasets (such as Dogs&People Thermal Dataset, UNIRI-TID) confirm that BiThermalNet can outperform traditional faster R-CNN by sizable margins with smaller models size. Moreover, a comparative analysis of the proposed methods on thermal images will also be presented.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In hazy weather, images and videos captured outdoor scenes often suffer from inadequate visibility, low contrast, and color shift due to the atmospheric light scattering from the atmosphere particles. In general, the haze is not uniformly distributed. It is a high challenge in computer vision-based applications to visualize matters behind hazy scenes like haze-free images. This paper aims to: i) develop a new optimal-based transmission map for removing haze or fog from a single image and a video; and ii) demonstrate the utility and effectiveness of the developed technique. The proposed method offers a single image de-hazing algorithm based on transmission map optimization and novel enhancement techniques. Intensive computer simulation results of natural Live-Haze dataset and synthetic image datasets such as O-HAZY, dataset show that: 1. The presented approach effectively removes haze and prevents color distortion from undesirable de-hazing. 2. The resulting dehazed images illustrate realistic colors and remarkable details. 3. The proposed method achieves to restore the visibility of hazy scenes and illustrates colorful and natural appearances.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Adverse weather conditions, such as rain, impact the visual quality of images and significantly impact the performance of vision systems for drone-based video surveillance and self-driving car applications. It is essential to develop algorithms that can automatically remove these artifacts and not degrade the rest of the image. Several methods have been proposed in literature and practice to address this problem. They mainly focus on specific rain models, such as droplets, streaks, mist, or a combination of these. Real-life rain images are largely randomized with diverse rain sizes, types, densities, and directions. Furthermore, rain impacts various image parts differently and is often randomly distributed. Most existing de-raining algorithms can't remove drops, streaks, and mist from images simultaneously. This paper addresses this issue by reviewing existing algorithms and datasets through a rain model lens. We present surveys and quantitative benchmarking of state-of-the-art intelligence algorithms based on the rain types they aim to remove. While other review papers exist on single image de-raining, our work looks at and outlines the different algorithms and datasets available for each specific rain model. Finally, the paper makes the following contributions: • Select the most recent state of the art algorithms and show their performance for each rain type on our combination dataset called the Combination Rain Model Dataset • Offers insights on the issues that still exist in the developing field of image de-raining and future steps in the field
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper analyzes the known method of the Fourier transforms-based alpha-rooting in image enhancement and describes a new method of alpha-rooting, by using the autocorrelation function of the image. The alpha-rooting in the frequency-domain can be described by the Taylor series, as well as in the spatial domain, by using the inverse 2-D DFT. In such a series, the alpha-rooting is the convolution of the image with the series of the autocorrelation functions. The application of the Taylor series in alpha-rooting allows to use the parameterized filters even for parameter alpha in a much larger interval than [0,1]. Examples of application of these two filters for enhancement of the grayscale images are given.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate ocular disorder classification and estimation of cornea depth and morphological changes depends on clear imaging of the affected structures. Ophthalmologists typically employ Optical Coherence Tomography (OCT) to help diagnose these conditions. This paper presents a new method called Alpha Mean Trim Local Binary Pattern (AMT-LPB) for automated texture classification of specific macular disease detected on OCT images of the retinal membrane. The performance of the proposed method achieved an overall accuracy of 99% using 10-fold cross-validation on the Duke University dataset [9].
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Chile has been in a mega-drought since 2010, leading to critical water scarcity in the central zone. Research projects that to 2040 climate change, in conjunction with the loose use of water for irrigation, will cause severe water stress to the whole country. Agriculture amounts to 72% of the water consumption in the zone, and in cities like El Tabo, 18% of the population consumes water provided in cistern trucks. Therefore, reliable registering of water use is essential for sustainable water management. However, in-field manual measurements of water use can be tedious, time, and labor- intensive and may exhibit significant spatial variability, and be inefficient for surveying large areas. Currently, private initiatives lead the efforts in water use optimization, consisting mainly of expensive in-site sensors and drones’ imagery, and the public sector is currently designing new policies, regulations, and management. Still, there is no record of the current status and their variation through time of the farms and paddocks at a country level. This paper aims to develop and test an automatic paddock’s boundaries segmentation/recognition system using hyperspectral images from Sentinel 2. The developed system includes new image enhancements, bands selection, and training a model using the images and handmade polygons. Having the boundaries, farmers could use them to understand the impact the climate change is causing on their crops and hopefully enable them to optimize data-driven irrigation scheduling and other optimization tools to preserve the water and keep its use at the minimum possible. The method was tested with sentinel-2-l2a data with 10-meter ground resolution in four spectral bands: Band 02 (visible blue), Band 03 (visible green), Band 04 (visible red), Band 08 (near-infrared). Multiple metrics are presented for results. It can play a vital role in supporting improved Chile’s water accounting—both independently and in combination with in-situ monitoring.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper we describe gradient operators that are defined along eight and more directions. For that, the model of rotations of coefficients inside the given mask is proposed. The set of such gradient operators, which is called the campus gradient is described on different examples. The 5×5 masks are considered for such operators, as the Sobel, Prewitt, Agaian gradients along 16 directions. The Nevatia-Babu, and Art-Sobel campus gradients are also described, and illustrative examples are given. The presented approach can be easier extended for large windows and even with more than 16 directions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Single-image super-resolution (SISR), which maps a low-resolution observation to a high-resolution image, has been extensively utilized in various computer vision applications. With the advent of convolutional neural networks (CNNs), numerous algorithms have emerged that achieve state-of-the-art results. However, the main drawback of CNN is the negligence in the interrelationship between the RGB color channel. This negligence further reduces crucial structural information of color and provides a non-optimal representation of color images. Furthermore, most of these CNN-based methods contain millions of parameters and layers, limiting the practical applications. To overcome these drawbacks, an endto- end trainable single image super-resolution method – Quaternion-based Image Super-Resolution network (QSRNet) that takes advantage of the quaternion theory is proposed in this paper. QSRNet aims at maintaining the local and global interrelationship between the channels and produces high-resolution images with approximately 4x fewer parameters when compared to standard CNNs. Extensive computer experimentations were conducted on publicly available benchmarking thermal datasets, including DIV2K, Flickr2K, Set5, Set14, BSD100, Urban100, and UEC100, to demonstrate the effectiveness of the proposed QSRNet compared to traditional CNNs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As society becomes increasingly reliant on autonomous vehicles, it becomes necessary for these vehicles to have the ability to navigate new environments. Environmental data is expensive to label especially because it comes from many different sensors, and it can be difficult to interpret how the underlying models works. Therefore, an adequate machine learning model for multi-modal, unsupervised domain adaptation (UDA) that is accurate and explainable is necessary. We aim to improve xMUDA, a state-of-the-art multi-modal UDA model by incorporating a multi-step binary classification algorithm, which allows us to prioritize certain data labels, and alongside human evaluation, we report the mIoU and accuracy of the final output.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
During the pandemic, it is a critical task to recognize individuals and to verify their identity without touching a surface or removing the face mask. Compared with other biometric modalities, iris recognition provides accurate, reliable, and contactless biometrics measure. Traditional iris recognition systems require high quality frontal iris images. The image quality dependency limits its recognition performance in standoff applications. However, standoff biometric systems work in a less controlled environment where the captured images may be nonideal and off-angle. Since segmentation is the first step among recognition tasks, having an accurate segmentation is extremely critical to achieving a high recognition performance especially for off-angle iris images. Recent advances in deep learning enable the usage of some convolutional neural networks (CNN) for the challenging iris segmentation task. During training process, binary iris segmentation masks feed to the CNN framework to learn the iris texture where all other eye structures included in the same class. However, the pupil and sclera segmentation may provide useful additional information for iris segmentation. In this paper, we investigate the CNN-based iris segmentation frameworks for binary segmentation and multi-class segmentation. We first train the deep networks with binary segmentation masks (iris vs. others). Then, additional deep networks are trained with multi-class segmentation masks where pupil, iris texture, sclera, and other eye structures in separate classes. Finally, we compare the segmentation accuracies with off-angle iris images where images are captured from -50° to 50° in angle. Based on the results from real experiments, the proposed method shows effectiveness in segmentation for off-angle iris images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Satellite imagery provides an efficient means of assessing and effectively planning search and rescue efforts in the aftermath of disasters such as earthquakes, flooding, tsunamis, wildfires, and conflicts. It enables timely visualization of buildings and the human population affected by these disasters and provides humanitarian organizations with crucial information needed to strategize and deliver the much need aid effectively. Recent research on remote sensing combines machine learning methodologies with satellite imagery to automate information extraction, thus reducing turn-around time and manual labor. The existing state-of-the-art approach for building damage assessment relies on an ensemble of different models to obtain independent predictions that are then aggregated to one final output. Other methods rely on a multi-stage model that involves a building localization module and a damage classification module. These methods are either not end-to-end trainable or are impractical for real-time applications. This paper proposes an Attention-based Two-Stream High-Resolution Network (ATS-HRNet), which unifies the building localization and classification problem in an end-to-end trainable manner. The basic residual blocks in HRNet are replaced with attention-based residual blocks to improve the model's performance. Furthermore, a modified cutmix data augmentation technique is introduced for handling class imbalance in satellite imagery. Experiments show that our approach significantly performs better than the baseline and other state-of-the-art methods for building damage classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Action Recognition in video is known to be more challenging than image recognition problems. Unlike image recognition models which use 2D convolutional neural blocks, action classification models require additional dimensionality to capture the spatio-temporal information in video sequences. This intrinsically makes video action recognition models computationally intensive and significantly more data-hungry than image recognition counterparts. Unequivocally, existing video datasets such as Kinetics, AVA, Charades, Something-Something, HMDB51, and UFC101 have had tremendous impact on the recently evolving video recognition technologies. Artificial Intelligence models trained on these datasets have largely benefited applications such as behavior monitoring in elderly people, video summarization, and content-based retrieval. However, this growing concept of action recognition has yet to be explored in Intelligent Transportation System (ITS), particularly in vital applications such as incidents detection. This is partly due to the lack of availability of annotated dataset adequate for training models suitable for such direct ITS use cases. In this paper, the concept of video action recognition is explored to tackle the problem of highway incident detection and classification from live surveillance footage. First, a novel dataset - HWID12 (Highway Incidents Detection) dataset is introduced. The HWAD12 consists of 11 distinct highway incidents categories, and one additional category for negative samples representing normal traffic. The proposed dataset also includes 2780+ video segments of 3 to 8 seconds on average each, and 500k+ temporal frames. Next, the baseline for highway accident detection and classification is established with a state-of-the-art action recognition model trained on the proposed HWID12 dataset. Performance benchmarking for 12-class (normal traffic vs 11 accident categories), and 2-class (incident vs normal traffic) settings is performed. This benchmarking reveals a recognition accuracy of up to 88% and 98% for 12-class and 2-class recognition setting, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The dynamics of gaze coordination in natural contexts are affected by various properties of the task, the agent, the environment, and their interaction. Artificial Intelligence (AI) lays the foundation for detection, classification, segmentation, and scene analysis. Much of AI in everyday use is dedicated to predicting people's behavior. However, a purely data-driven approach cannot solve development problems alone. Therefore, it is imperative that decision-makers also consider another AI approach—causal AI, which can help identify the precise relationships of cause and effect. This article presents a novel Gaze Feature Transverse Network (Gaze-FTNet) that generates close-to-human gaze attention. The proposed end-to-end trainable approach leverages a feature transverse network (FTNet) to model long-term dependencies for optimal saliency map prediction. Moreover, several modern backbone architectures are explored, tested, and analyzed. Synthetically predicting human attention from monocular RGB images will benefit several domains, particularly humanvehicle interaction, autonomous driving, and augmented reality.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
There are more than 400,000 new cases of kidney cancer each year, and surgery is its most common treatment. Accurate segmentation and characterization of kidneys and kidney tumors is an important step in quantifying the tumor's morphological details to monitor the progression of the disease and improve treatment planning. Segmentation of kidney tumors in CT images is a challenging task due to the low contrast, irregular motion, diverse shapes, and sizes. Furthermore, manual delineation techniques are extremely time-consuming and are prone to errors due to the variability between different specialists. The literature provides the application of 3D Convolutional Neural Networks (CNNs) for the segmentation of kidneys and tumors. While effective, 3D CNNs are computationally expensive. Our work proposes the applications of a novel 2D CNN architecture to segment kidneys and tumors from CT images. The proposed architecture uses features from enhanced images to improve the segmentation performance. Quantitative and qualitative analysis of the proposed model on the KiTS19 dataset shows the improvement against recent state-of-the-art architectures.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
To establish stable video operations and services while maintaining high quality of experience, perceptual video quality assessment becomes an essential research topic in video technology. The goal of image quality assessment is to predict the perceptual quality for improving imaging systems' performance. The paper presents a novel visual quality metric for video quality assessment. To address this problem, we study the of neural networks through the robust optimization. High degree of correlation with subjective estimations of quality is due to using of a convolutional neural network trained on a large amount of pairs video sequence-subjective quality score. We demonstrate how our predicted no-reference quality metric correlates with qualitative opinion in a human observer study. Results are shown on the MCL-V dataset with comparison existing approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Suicidal ideation, attempts, and deaths among adolescents are a major and growing health concern. In 2019, suicide accounted for 11% of adolescent deaths in the U.S. (second-leading cause of death among U.S. teenagers). Accurately predicting suicidal thoughts and behaviors (STBs) among adolescents remains challenging. This study aimed to identify the most accurate prediction models for adolescent STBs using machine learning (ML) methods. The predictors were selected by expert-informed and parametric models. The study used the data from Mississippi Youth Risk Behavior Surveillance System (YRBSS). The data were collected from Mississippi public high school students between 2001 and 2019 (inclusive). A broad array of features (survey question responses) were available to train the models including depression, drug use, bullying, violence, online habits, diet, and sports participation. We applied support vector machine (SVM), random forest, and neural network algorithms to the YRBSS data. Suicide ideation (consideration) or suicide attempt are used as the outcome variables. Data-derived ML models performed well in predictive accuracy. These results are compared with three ML algorithms versus three different methods of predictor variable selection. The highest accuracy was achieved with expert-informed models. The accuracy of predicting suicide ideation was slightly higher than the accuracy of suicide attempt. The difference between ML algorithms was insignificant. These prediction models of suicide ideation and attempt may help Mississippi public high schools educators, parents, and policy makers, better target risk behaviors and hence effectively prevent adolescent suicide in Mississippi.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The article presents a noise reduction method based on minimizing a multicriteria objective function. The technique makes it possible to perform minimization according to the criteria of the root-mean-square difference of the deviation between adjacent estimates of pixel values (vertical, horizontal) and between the mean-square difference of the input elements and the resulting estimates. The first criterion allows you to reduce the noise component in locally stationary areas of the image, the second to preserve the boundaries of transitions between objects. In the article, the adaptation of the choice of the processing parameter is performed using a trained neural network. The training was carried out on standard test images from widely used databases (Kodak, MS COCO, etc.). Tables comparing the effectiveness of the proposed adaptation algorithm to the previously applied approach are given.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The 3-D reconstruction is relative to constructing a mathematical representation of the scene geometric. In most existing approaches, Lambert's law of the object reflectivity in the scene is explicitly or implicitly assumed. In practice, the law of reflectivity of scene objects differs from Lambert's law as, for example, objects with properties: semi-transparent, transparent, specular, with subsurface scattering effects. This paper proposes an algorithm to estimate a surface reflectance model of 3-D shape and parameters from multiple views. The proposed algorithm includes the following steps: 1) determination of the optical properties of the scanned scene by separating the direct and global lighting components using high-frequency templates; 2) generation of a set of patterns of structured light, the structure of which depends on the optical properties of the scanned scene; 3) scanning the scene using the generated structured light patterns for views non- Lambertian surface; 4) construction of a 3-D model of the scene by triangulating methodology. To solve the problem of determining the views non-Lambertian surface, a 3-D reconstruction algorithm based on a convolutional neural network is proposed. To train the neural network, we apply two stages. At the first stage, the encoder is trained for the descriptor description of the input image. In the second step, a fully connected neural network is added to the encoder for regression for choosing the best views. The coder is trained using the generative adversarial methodology to construct a descriptor description that stores spatial information and information about the optical properties of surfaces located in different areas of the image. The codec network is trained to recover the defect map (depends directly on the sensor and scene properties) from a color image. The architecture of the neural network (generator) is based on the U-Net architecture. As a result, this method uses non-Lambertian properties, and it can compensate for triangulation reconstruction errors caused by viewdependent reflections. Experimental results on both synthetic and real objects are given.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The World Health Organization (WHO) called for a global fight against cervical cancer. There are an estimated 569,000 new cases and 310,000 deaths annually. Searching for practical approaches to deal with cervical cancer screening and treatment has been an urgent research subject. One solution could be to use label-free two-photon excited fluorescence (TPEF) imaging to address this need. The colposcopy-guided biopsy method is being used for cervical precancer detection relying primarily on morphological and organization cell and tissue feature changes. However, the overall performance of colposcopy and biopsy remains unsatisfactory. Label-free TPEF provides images with high morphological and functional (metabolic) content and could lead to enhanced detection of cervical pre-cancers. This paper uses the cell texture and morphology features to classify stacks of such TPEF images acquired from freshly excised healthy and pre-cancerous human cervical tissues. Herein, an automated denoising algorithm and a parametrized edge enhancement method is used for pre-processing the images in the stack. The computer simulations performed on the privately available dataset of 10 healthy stacks, 53 precancer stacks, and the recall and specificity of 100 %, respectively, were observed for both texture and morphology features. However, the dataset used to acquire these results is small. The presented model can be used as a base model for further research and analysis of a larger data set to identify early cervical cancerous changes and potentially significantly improve diagnosis and treatment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.