PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
1Nanyang Technological Univ. (Singapore) 2Huazhong Univ. of Science and Technology (China) 3China University of Geosciences (Wuhan) (China) 4Wuhan Univ. (China)
This PDF file contains the front matter associated with SPIE Proceedings Volume 12342 including the Title Page, Copyright information, and Table of Contents.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Generally, biometrics is gaining increased attention due to its application for secure and efficient verification – more specifically at border crossing points. Usually, there are many different types of biometrics associated with human body i.e., intrusive like finger prints etc. and non-intrusive, termed as soft biometrics. In order to make the concept of Smart Borders a reality, the non-intrusive soft biometrics are the baseline technology. One of biggest challenge in soft biometrics based verification is to find a highly related set of features from different modalities of human body – as there is large number such soft biometrics associated with human body. In fact, this is extremely useful to select only those soft biometrics which are supportive to each other and relevant to the problem domain. In our work, we thoroughly investigated one of the largest collection of soft biometrics and developed a multiple non-linear regression based framework for the selection of highly supportive and relevant soft biometrics. We used one of the largest dataset e.g., PETA and its annotation for the evaluation of our proposed model. The accuracy is reported in form of MAE and error distribution graphs for two global soft biometrics i.e., gender and age prediction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Compared with point features, line features can provide more geometric information in vision tasks. Although traditional line descriptor methods have been proposed for a long time, learning-based line descriptor methods still need to be strengthened. Inspired by the message passing mechanism of graph neural networks, we propose a new neural network architecture named LDAM that alternately uses two attention mechanisms to augment line descriptors and extract more line correspondences. Compared with previous methods, our method learns the geometric properties and prior knowledge of images through the mutual aggregation of features between a pair of images. The experiments on real data verify the good performance of LDAM in terms of matching accuracy. Furthermore, LDAM is also robust to viewpoint change or occlusion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human action recognition has been one of the hot topics in computer vision both from the handcrafted and deep learning approaches. In the handcrafted approach, the extracted features are encoded for reducing the size of these features. Amonsgt the state-of-the-art approaches is to encode these visual features using the Gaussian mixture model. However, the size of the codebook is an issue in terms of the computation complexity, especially for large-scale data as it requires encoding using a large codebook. In this paper, we introduced the use of different optimizers to reduce the codebook size while boosting its accuracy. To illustrate the performance , first we use the improved dense trajectories (IDT) to extract the handcrafted features. This is followed with encoding the descriptor using Fisher kernel-based codebook using the Gaussian mixture model. Next, the support vector machine is used to classify the categories. We then use and compare five different Stochastic gradient descent optimization techniques to modify the number of Gaussian components. In this manner we are able to select the discriminative foreground features (as represented by the final number of Gaussian components), and omit the background features. Finally, to show the performance improvement of the proposed method, we implement this technique to two datasets UCF101 and HMDB51.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aggregate shape, angularity and surface texture are closely related to pavement performance of asphalt mixture. In order to quantitatively analyze the morphological characteristics of aggregates, the aggregate particle image was obtained by "backlight scanning method", and then noise removal, segmentation and hole filling are performed on the acquired image based on digital image processing technique. On the basis of above mentioned, a two-dimensional aggregate morphological characteristics evaluation system (AMCES) with low equipment requirements was developed. The shape property of aggregates were characterized by shape index (SI) and form factor (FF), and the angularity property and surface texture of aggregates were evaluated by angularity index (AI) and texture factor (TF) respectively. Finally, the morphological characteristics of 12 different standard shaped objects and limestone with 4 different sizes were analyzed. The test results show that the four evaluation parameters can describe the morphological characteristics of aggregate particles well. With the increase of particle size, the shape index decreases, the value of the form factor get closer and closer to 1, while the angularity index and texture factor both decrease gradually.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problems of inaccurate segmentation edges, poor adaptability to multi-scale road targets, prone to false segmentation and missing segmentation when segmenting road targets with various and changeable occlusions in the traditional U-Net model, a semantic segmentation model of road scene based on multi-scale feature extraction and deep supervision module is proposed. Firstly, the dual attention module is embedded in the U-Net encoder, which can make the model have the ability to capture the context information of channel dimension and spatial dimension in the global range, and enhance the road features; Secondly, before upsampling, the feature map containing high-level semantic information is input into ASPP module to obtain road features of different scales; Finally, the deep supervision module is introduced into the upsampling part to learn the feature representation at different levels and retain more road detail features. Experiments are carried out on CamVid dataset and Cityscapes dataset. The results show that our Network can effectively segment road targets with different scales, and the segmented road contour is more complete and clear, which improves the accuracy of semantic segmentation while ensuring a certain segmentation speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem of insufficient spectral–spatial feature extraction under the condition of limited labeled samples in hyperspectral image classification task, in this paper, a dynamic spectral–spatial multiscale feature extraction network is proposed to extracting more discriminative feature information. Different from the fixed size kernels to extract single feature information, we add dilated convolution in multiscale convolution network to obtain the fusion of neighborhood and global feature information. Besides, a joint spectral–spatial dynamic convolution network is proposed constructed with double attention branches. Spectral attention module is introduced in dynamic convolution to adaptively enhance the useful bands in classification task. That makes dynamic convolution neural network more effective through reconstructing the feature information obtained from different kernels. The experiments, conducted on two commonly hyperspectral image datasets, demonstrate that the proposed method is superior to other state of art classification methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of deep learning models, the performance of object detection have made great success in recent years. However, the problem of low detection efficiency still exists in two-stage detection model. In this paper, we design a lightweight fully convolution neural network(LFCNN) as backbone to extract features more efficiently. Firstly, LFCNN is a lightweight network with only a small number of network parameters, which ensures that it can complete the feature extraction task more quickly while maintaining detection accuracy. Secondly, LFCNN uses residual connection to ensure the performance of the deep network and uses dense connection to realize the reuse and fusion of multi-layer features of the network, which significantly improve the detection accuracy. Moreover, we also come up with a novel method called anchor scale generator(ASG) to obtain the proper predefined anchor scales for generating more accurate region proposals, which further enhances localization ability of objects. A large number of experiments on Pascal VOC and COCO datasets show that our approach is superior to other methods in both bounding boxes localization accuracy and detection performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Human pose estimation and person detection are two fundamental tasks of human behavior analysis. There has been remarkable progress in these two tasks separately since the development of convolutional neural network. Recently, researchers have paid more attention to one-stage human pose estimation and person detection for the needs of practical application. However, few researches have been reported on completing these two tasks in a single network simultaneously. There are two main reasons: (1) designing an effective mechanism that makes full use of their relevance and complementation to achieve common progress, especially the pose estimation accuracy is really challenging, (2) evaluation bias caused by scale sensitivity difference remains unsolved. To address these problems, we propose a multi-task model for human pose estimation and person detection simultaneously, named PersonPD (person pose and person detection). It predicts keypoint heatmaps and regresses a 4D relative displacement vector (l,t,r,b) which actually encodes the person bounding box and also acts as keypoints' grouping clues. A maximum IOU matching algorithm, named IOU-grouping, is presented to group body joints into individual persons. At the same time, it generates accurate person detection results. In this simple but effective method, our model get competitive person detection and pose estimation performance on COCO datasets1.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning has been widely used for the ship target detection in the synthetic aperture radar (SAR) images. The existing researches mainly uses the anchor frame-based detection method to generate the candidate frames to extract the specific targets. However, this method requires the additional computing resources to filter out the many repeated candidate frames, which will lead to the poor target positioning accuracy and low detection efficiency. To solve these problems, this paper constructs an anchor-free frame for the ship target detection in the SAR images. An improved lightweight detection method based on the target key point is proposed for the real-time detection of the SAR images, which can achieve the rapid and accurate positioning of the ship targets in the SAR images. The experimental results prove that the proposed method has the better detection performance and stronger generalization capability, which is beneficial to realize the real-time detection of the ship targets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Recently, lane detection has made great progress in autonomous driving. RESA (REcurrent Feature-Shift Aggregator) is based on image segmentation. It presents a novel module to enrich lane feature after preliminary feature extraction with an ordinary CNN. For Tusimple dataset, there is not too complicated scene and lane has more prominent spatial features. On the basis of RESA, we introduce the method of position embedding to enhance the spatial features. The experimental results show that this method has achieved the best accuracy 96.93% on Tusimple dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problems that the classical edge detection method is easily affected by noise and has low detection accuracy when applied to SAR target images, this paper studies the detection performance of the classical edge detection method Canny, CNN-based edge detection methods Holistically Nested Edge Detection (HED) and Richer Convolutional Features (RCF) when applied to SAR target images for the first time. The detection performance is evaluated using the MSTAR dataset, and the detection results of each method are compared based on the common evaluation indicators of image edge detection: F-measure, PR curve, and FPS. Canny's F-measure (ODS) is 0.611 and FPS is 43. The F-measure (ODS) of HED is 0.758 and the FPS is 18. The F-measure (ODS) of RCF is 0.729 and the FPS is 24. The F-measure (ODS) of RCF-MS is 0.753 and the FPS is 6. On the MSTAR dataset, the F-measure of HED is the best, which is 24.06% higher than Canny. RCF and RCF-MS also performed well, which were 19.31% and 23.24% higher than Canny respectively. The edge detection method based on CNN has higher F-measure, is less affected by noise, and has less loss of edge details. When applied to SAR images affected by speckle noise, the performance is much better than Canny, but there is still a shortage of slightly worse computing speed.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ship detection is important to guarantee maritime safety at sea. In optical remote sensing images, the detection efficiency and accuracy are limited due to the complex ocean background and variant ship directions. Therefore, we propose a novel ship detection method, which consists of two main stages: candidate area location and target discrimination. In the first stage, we use the spectral residual method to detect the saliency map of the original image, get the saliency sub-map containing the ship target, and then use the threshold segmentation method to obtain the ship candidate region. In the second stage, we obtain the radial gradient histogram of the ship candidate region and transform it into a radial gradient feature, which is rotation-invariant. Afterward, radial gradient features and LBP features are fused, and SVM is used for ship detection. Data experimental results show that the method has the characteristics of low complexity and high detection accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The dock target in remote sensing images has the characteristics of slender structure and direction arbitrarily. The general target detection algorithm based on the convolutional neural network cannot effectively obtain the direction information of the target, which cannot meet the actual demand of dock detection. This study designed a deep convolutional neural network architecture in any direction based on the YOLOv4 algorithm aimed at resolving the above problems. First, the multidimensional coordinate method was used to calibrate the dock target so that the network could contain the direction information of the target. Second, the loss function of the algorithm was optimized to make it suitable for directional target detection. Finally, an attention mechanism was introduced to enhance the extraction ability of the algorithm and further improve its detection accuracy. Two datasets of dock target detection from remote sensing images were selected for experiments, and the results showed that the improved YOLOv4 network was better than the other networks in the dock target detection task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For the detection of plastic gears, most factories still use manual method with measurement tools. Therefore, the efforts expended in their defect detection are tremendous in the production processes. This paper proposes a new method that detects defection for plastic gears during their production and recycling processes. An image dataset of different kind of plastic gears was created. Then, a defect detection DL model was proposed based on GoogLeNet; it detected whether the plastic gears have missing teeth (MT), edge fin (EF), or good quality (GQ). An independent dataset was created to test the DL model: the accuracy of this model reached 94.8%. Combined with MV and DL methods, this paper realizes the automatic detection of plastic gear defects. Based on the independent plastic gear data set, the effect of defect detection method is verified by experiments. The results have important theoretical value and practical significance for liberating manpower and promoting the automatic process of plastic gear defect detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fake technology has evolved to the point where fake faces are increasingly difficult to distinguish from real ones. If the forged face videos spread wildly on social media, social unrest or personal reputation damage may lead to social unrest. A face tampering detection method (RALNet) with spatiotemporal attention residual network is designed to reduce the misuse of face data due to malicious dissemination. Firstly, we propose a process to extract video face data, which reduces the interference of irrelevant information and improves the utilization of data processing. Then, based on the characteristics of incoherence and inconsistency in spatial and temporal information of tampered videos, the spatial domain features and temporal domain features of the target face video are extracted by introducing an attention mechanism of residual network and long short-term memory network to classify the targets as true or fake. The experimental results show that the method can effectively detect whether the face data is tampered, and its detection accuracy is better than other methods. In addition, it also achieves good performance in terms of recall, precision, and F1 score.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Ship target detection is of great significance in marine surveillance, rescue and so on. In this paper, in order to improve the performance of ship target detection, we proposed a ship target detection method based on multi task learning. There are mainly two contributions. Firstly, we designed a multi-task learning model by integrating segmentation module to the faster RCNN model. Through the strategies of feature sharing and joint learning, it is helpful to improve the accuracy of target detection with the assistance of segmentation; Secondly, in order to deal with the impact of initial anchor frame scale on target detection accuracy, we introduced an adaptive anchor width height ratio setting method based on improved K-means algorithm, by adaptively select initial anchor size suitable for the characteristics of ship targets, it is beneficial to further improve the detection accuracy. Moreover, we constructed an extended version of ship image data set including 14614 images belonging to 13 categories. Experimental results demonstrated that the proposed model can effectively improve the accuracy of ship target detection; and the comparison and the ablation experiments further validated the strategies of multi-task joint learning and adaptive anchor size setting is helpful for improving the performance of ship target detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The research was carried out to detect plant protection products on plant leaves using excited luminescence. The tests were carried out for the leaves of two random plant species and four typical plant protection products. The study of excitation and emission spectra (EX-EM) was carried out on the Edinburgh Instruments FS900 luminescence spectrometer equipped with an attachment for measurements from the surface. The collected EX-EM characteristics of clean leaves were compared with the EX-EM characteristics of leaves coated with plant protection products and the EX-EM characteristics of the agents themselves. The obtained results allowed for the assessment of the suitability of the excited luminescence method for the plant protection spraying measuring, including the detection or identification of inappropriate use of plant protection products by a farmer.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Accurate and fast detection of fabric defects is of great significance to improve the production efficiency of textile enterprises. However, fabric defects have problems such as large-scale changes, small objects, and unbalanced numbers. Therefore, a fabric detection method integrating deformation convolution and self-attention is proposed. The algorithm effectively alleviates the problem of the model's insufficient ability to extract irregular flaw features by combining multiscale feature extraction with deformation convolution; At the same time, combined with the self-attention mechanism, a dual-channel feature fusion is designed, and adaptive adjustment and fusion are performed to obtain more effective features to make up for the low detection rate of small object defects. Finally, an adaptive bounding box generator is designed in the region proposal network to obtain more accurate object bounding boxes for subsequent detection and regression. Experimental results show that the proposed method has a good detection effect, and effectively improves the accuracy and efficiency of fabric defect defection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The free and open access optical sensor data from the Sentinel-2 constellation can be used for supporting the operation of oil spill monitoring. It has several spectral bands from visible to shortwave infrared with medium to high resolution, which is suitable for detecting the oil spills. However, the spectral signature of the oil spill often has similar to the surrounding environment. Moreover, it also depends on many parameters, such as sensing angle, sea depth, wave characteristics, etc. In this paper, we propose the method for detecting the oil spill by using the Sentinel-2 images. It is based on the Mixed Normalized Difference Index (MNDI) derived from the Normalized Difference Vegetation Index (NDVI) and the Reversed version of the Normalized Difference Index (NDI) applied for the forest fire monitoring. This index will give the high variation values in the oil spill area, which can estimate the oil spill area by observing its spatial roughness. Four study areas in Saudi Arabia, Greece, Azerbaijan, and Indonesia were used to evaluate the detectable performance compared with the current methods. The visualized results show that our algorithm gives noticeable results and high contrast, including low noise, except the oil spills in Greece.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Breast cancer is the most common form of invasive cancer in women. In recent years, it has become standard practise to perform breast mass evaluations using ultrasound (US) imaging. US can accurately distinguish between malignant and benign breast masses when used by skilled radiologists, as compared to other medical imaging modalities such as MRI. Human domain knowledge is difficult to incorporate into the diagnosis of breast tumours because it differs greatly from person to person in terms of shape, border, curve, intensity, and other commonly used medical priors. A deep learning model that incorporates visual saliency can now be used to segment breast tumours in ultrasound images. Radiologists use the term "visual saliency," which refers to areas of an image that are more likely to be noticed. Features that prioritise spatial regions with high saliency levels are learned using the proposed method. According to validation results, tumours are more accurately identified in models that include attention layers than those without them. The salient attention model has the potential to improve medical image analysis accuracy and robustness by allowing deep learning architectures to incorporate task-specific knowledge. AUC-ROC plots show that our new model is more accurate in terms of IOU and AUC-ROC scores, dice score, precision, recall, and IOU.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Salient object detection is a fundamental problem in the research of image and vision. However, traditional models have low confidence and low recall. Although deep learning methods can better locate objects, the boundaries are often not detailed enough. To address these issues, we propose a salient object detection model (RF2Net) that combines traditional level set methods with deep learning. RF2Net incorporates the idea of level set structured loss and reverse attention mechanism on the basis of F3Net. First, RF2Net uses a new loss function that combines BCE(Binary Cross Entropy) loss, weight level set loss and weight MAE(Mean Absolute Error) loss with multi-indicator joint supervision. Through the role of the level set loss operator, it is possible to better focus on the whole of the image instead of pixel-by-pixel supervision like the BCE loss. The introduction of the reverse attention mechanism can effectively reduce the noise during feature fusion between layers and achieve the purpose of improving accuracy. The experiments are compared with 12 state-of-the-art methods on 4 datasets, and MAE, maxF and avgF are all higher than other algorithms in HKU-IS dataset. At the same time, we also conduct ablation experiments on the DUTS dataset and the ECSSD dataset to verify the effectiveness of the algorithm. The ablation experimental results show that the proposed algorithm can effectively improve the effect of salient object detection.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Detecting roads from high-resolution photographs can serve forestry, agriculture, traffic and even military areas, and produce significant social and economic value. In this paper, we present a novel method that utilizes the flatness and the connectivity to detect the road in high-resolution aerial images. The method iterates the probable locations of the roads by using the flatness and connects the roads by using the connectivity. Firstly, we introduce a concept of ‘footprint’, which reveals the probable location and extension direction of a road. Given an initial footprint, we assess the flatness between locations to search the resulting footprint. By iterating and connecting the footprints, our approach produces a set of connected line segments that reflect the road to be detected. In addition, a footprints initialization algorithm is introduced to make our method totally automatic, and a road network pruning algorithm is designed to make the result clearer and more accurate. Tested under three high-resolution aerial photographs, our method achieved an accuracy of more than 80%. The algorithm is adapted for road detection and still linear target detection in high-resolution aerial photographs. Since the algorithm does not require artificial features or training data, it can be quickly deployed in application.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Weight sharing across different locations makes Convolutional Neural Networks (CNNs) space shift invariant, i.e., the weights learned in one location can be applied to recognize objects in other locations. However, weight sharing mechanism has been lacked in Rotated Pattern Recognition (RPR) tasks, and CNNs have to learn training samples in different orientations by rote. As such rote-learning strategy has greatly increased the difficulty of training, a new solution for RPR tasks, Pre-Rotation Only At Inference time (PROAI), is proposed to provide CNNs with rotation invariance. The core idea of PROAI is to share CNN weights across multiple rotated versions of the test sample. At the training time, a CNN was trained with samples only in one angle; at the inference-time, test samples were pre-rotated at different angles and then fed into the CNN to calculate classification confidences; at the end both the category and the orientation were predicted using the position of the max value of these confidences. By adopting PROAI, the recognition ability learned at one orientation can be generalized to patterns at any other orientation, and both the number of parameters and the training time of CNN in RPR tasks can be greatly reduced. Experiments show that PROAI enables CNNs with less parameters and training time to achieve state-of-the-art classification and orientation performance on both rotated MNIST and rotated Fashion MNIST datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the advent of large-scale video datasets, action recognition using three-dimensional convolutions (3D CNNs) containing spatiotemporal information has become mainstream. Aiming at the problem of classroom student behavior recognition, the paper adopts the improved SlowFast network structure to deal with spatial structure and temporal events respectively. First, DropBlock (a regularization method) is added to the SlowFast network to solve the overfitting problem. Second, for the problem of Long-Tailed Distribution, the designed Smooth Sample (SS) Loss function is added to the network to smooth the number of samples. Classification experiments show that compared with similar methods, the model accuracy of our method on the Kinetics and Student Action Dataset is increased by 2.1% and 2.9%, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the action occlusion and information loss caused by the view changes, view-invariant human action recognition is challenging in plenty of real-world applications. One possible solution to this problem is minimizing representation discrepancy in different views while learning discriminative feature representation for view-invariant action recognition. To solve the problem, we propose a Spatio-temporal Dual-Attention Network (SDA-Net) for view-invariant human action recognition. The SDA-Net is composed of a spatial/temporal self-attention and spatial/temporal cross-attention modules. The spatial/temporal self-attention module captures global long-range dependencies of action features. The cross-attention module is designed to learn view-invariant co-occurrence attention maps and generates discriminative features for a semantic representation of actions in different views. We exhaustively evaluate our approach on the NTU- 60, NTU-120, and UESTC datasets with multi-type evaluations, i.e., Cross-Subject, Cross-View, Cross-Set, and Arbitrary-view. Extensive experiment results demonstrate that our approach exceeds the state-of-the-art approaches with a significant margin in view-invariant human action recognition.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Action recognition methods based on human skeletons can clearly express human actions. We present a lightweight graph convolutional network with various streams of data, given the network’s computational complexity and high computational cost of current mainstream human action recognition networks. First, four characteristic data streams are fused using a multi-stream data fusion algorithm, and the best result can be produced with only one training session, minimizing the network’s computational complexity. Second, a non-local graph convolution module based on the graph convolutional network is designed to collect the image’s global information and increase action recognition accuracy. Finally, the spatial Ghost graph convolution module and the temporal Ghost graph convolution module are intended to minimize the network’s computational complexity even more. On the action recognition datasets NTU60 RGB+D and NTU120 RGB+D dataset Our methods achieve highly competitive performance, with average precision of 96.4 and 87.5 percent respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
COVID-19 and its variants have been posing a large risk to people around the world since the outbreak of the disease. Many techniques like AI are explored to help combat epidemics. People are required or forced to wear a mask to fight against COVID-19 epidemics worldwide. It brings new challenges to the task of masked facial region recognition. When facial regions are occluded by masks, it will result in some failures of face detection algorithms. In this paper, we propose a method to recognize masked faces. It mainly includes three parts. Firstly, the human pose is estimated to produce a series of key points. It is implemented by OpenPose. Secondly, a key-points location strategy is designed to capture the masked facial regions. It can locate the positions of faces accurately. Thirdly, the broad learning system, which is also an incremental learning algorithm, is employed to recognize the classes of candidate regions. Experiments conducted on some datasets shed light on the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a novel LBCNN model with AM Softmax based on bilinear CNN (BCNN) and AM Softmax loss function, which can better fit fine-grained birds recognition tasks. There are mainly two contributions. Firstly, in order to reduce the model size and recognition time, we design a lightweight BCNN model to reduce the parameters. We replace original VGG16 backbone with MobileNet structure which decomposes the convolution operation into two smaller operations: depthwise revolution and pointwise revolution. Secondly, to make up for the decrease in accuracy, we introduce the Additive Margin Softmax (AM Softmax) loss function to enhance the discrimination ability. By comprehensive discussion of the influence of different parameter settings and different loss functions, we test the proposed lightweight BCNN on the bird dataset CUB-200-2011. Experimental results demonstrate that the proposed model can achieve comparable results with much fewer parameters.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visible-infrared person re-identification (VI-ReID) aims to search person images across cameras of different modalities, which can address the limitation of visible-based ReID in dark environments. It is a very challenging task, as images of the same identity have huge discrepancy in different modalities. To address this problem, a cross-modality ReID model based on sample diversity and identity consistency is proposed in this paper. For sample diversity, auxiliary images are introduced based on the idea of information transfer. The auxiliary images combine the information of visible images and infrared images, and can improve the diversity of input data and robustness of the network. For identity consistency, homogeneous distance loss and heterogeneous distance loss are developed from four different perspectives to shorten the distance between the samples of same identities. Extensive experimental results demonstrate the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Fruit juices and vegetable and fruit juices are the products, which provide our bodies with a lot of valuable and nutritional ingredients and play a major role in prevention of numerous illnesses. Raspberries are the valuable source of bioactive compounds. As part of preserving food, whose main aim is to extend stability of products obtained only in season, the researchers took advantage of spray drying technique. In the research part of the study, research samples were prepared in the form of raspberry powders obtained from the process of dehumidified spray drying. Because of the research, a neural model was made, which supported the evaluation of the quality of detecting powder samples based on their color. The devised neural network reached classification accuracy at 0.924.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Low-resolution face recognition (LRFR) intends to identify unknown poor-quality face images and is widely employed in real-world surveillance applications. While collecting a large-scale labeled low-resolution (LR) face dataset could be conducive, it is practically infeasible due to labor costs and privacy issues. In contrast, accessing high-resolution (HR) face datasets is relatively effortless. However, prevailing domain adaptation techniques are often tenuous as they demand sharing of similar face images at different resolutions. We propose disjoint-identity resolution adaptation (DIRA) to transfer substantial face semantic representations from HR to LR face images, despite disjoint identities and limited labeled LR images. We accredit that continuous adversarial learning between HR-LR resolution alignment and segregation renders effective feature extraction and discriminative LR face representation. Our experimental results show a notable performance boost over the recent state-of-the-art methods for the challenging realistic low-resolution face recognition task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the fact that counterfeit in second-handed goods terribly affects trading in markets of second-handed luxury bags, users in this research thus present studies of methods to classify genuineness of ‘Gucci GG Canvas’ with the pretrained model from Model VGG16 and with DenseNet121 to design deep Convolutional Neural Networks (CNN) model for binary classification. The CNN together with DenseNet121 model comprises accuracy at 95%, which is more than the 2 prior models, i.e., CNN from scratch and CNN together with VGG16.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Masked face recognition becomes an important issue of prevention and monitor in outbreak of COVID-19. Due to loss of facial features caused by masks, unmasked face recognition could not identify the specific person well. Current masked faces methods focus on local features from the unmasked regions or recover masked faces to fit standard face recognition models. These methods only focus on partial information of faces thus these features are not robust enough to deal with complex situations. To solve this problem, we propose a joint feature aggregation method for robust masked face recognition. Firstly, we design a multi-module feature extraction network to extract different features, including local module (LM), global module (GM), and recovery module (RM). Our method not only extracts global features from the original masked faces but also extracts local features from the unmasked area since it is a discriminative part of masked faces. Specially, we utilize a pretrained recovery model to recover masked faces and get some recovery features from the recovered faces. Finally, features from three modules are aggregated as a joint feature of masked faces. The joint feature enhances the feature representation of masked faces thus it is more discriminative and robust than that in previous methods. Experiments show that our method can achieve better performance than previous methods on LFW dataset.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Weeds are a common issue in agriculture. Image-based weed identification has regained popularity in recent years as computing power increases. Researchers have successfully applied weed detection in the crop field and have combined the sensor (e.g.camera) and mechanical such as robotic weeders to get the location of the weeds. Meanwhile, many studies also have been conducted on the two classifications between grass and weed. However, there is no excellent and comprehensive weed dataset in reality because weeds are always similar and difficult to obtain by non-specialists. Moreover, it is challenging to identify weeds from grasslands for their similar colors, sizes, and shapes. We investigate three weeds (Bitter Gentian, Hawk's Beard, Pedunculate) relatively common in grasslands. Then, we select the typical grassland dominated by the above weeds for data collection. A natural and effective dataset is built and has generality in the scene of actual grassland. Secondly, we extract image features, including Color, Histogram, and orientation gradient histogram (HOG), and make various combinations to accurately and comprehensively reflect the actual characteristics of weeds. Thirdly, we propose a "core zone" algorithm to locate the weeds. The algorithm mainly adopts technology in image processing, such as threshold segmentation and morphological transformations. Experiments show that our binary classifier is more accurate than the comparison method, and the accuracy of the multi-classifier is also high. In addition, the algorithm for weeds location is more efficient than the comparative method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Rotated target recognition is a challenge for Convolutional Neural Networks (CNN), and the current solution is to make CNN rotational invariant through data augmentation. However, data augmentation makes CNN easy to overfit small scale sonar image datasets, and increases its numbers of parameters and training time. This paper proposes to recognize rotated targets of sonar images using a novel CNN with Rotated Inputs (RICNN), which doesn’t need data augmentation. During training, RICNN was trained with sonar images of targets only at one orientation, which avoid it to learn multiple rotated versions of the same targets, and reduces both number of parameters and training time of CNN. During testing, RICNN calculated classification scores for each test image and its all-possible rotated versions. The max of these classification scores were used to simultaneously estimate the category and orientation of each target. Besides, to improve the generalization of RICNN on imbalanced sonar datasets, this paper also designs an imbalanced data sampler. Experiments on a self-made small, imbalanced sonar image rotated target recognition dataset show that the improved RICNN achieves 4.25% higher classification accuracy than data augmentation, and reduces the number of parameters and training time to 2.25% and 19.2% of that of data augmentation method. Moreover, RICNN achieves comparable orientation estimation accuracy with a CNN orientation regressor trained with data augmentation. Codes, dataset are publicly available.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The purpose of this paper is to present a dataset for facial expression analysis and facial animation. Nearly all existing Facial Action Coding System-based datasets that include facial action unit (AU) intensity information annotate the intensity values hierarchically using A–E levels. However, facial expressions change continuously and shift smoothly from one state to another. Therefore, it is more effective to regress the intensity value of local facial AUs to represent whole facial expression changes, particularly in the fields of expression transfer and facial animation. We introduce an extension of FEAFA in combination with the relabeled DISFA database, which is available at http://www.iiplab.net/feafa+/ now. Extended FEAFA (FEAFA+) includes 154 video sequences from FEAFA and DISFA, with a total of 230,184 frames being manually annotated on floating-point intensity values of 24 redefined AUs using the Expression Quantitative Tool. We list crude numerical results for posed and spontaneous subsets and provide a baseline comparison for the AU intensity regression task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Deep learning methods have proven promising performance in decoding specific task states based on functional magnetic resonance imaging (fMRI) of the human brain, however, they lack transparency in their decision making, in the sense that it is not straightforward to visualize the features on which the decision was made. In this study, we investigated the decoding of four sensorimotor tasks based on 3D fMRI according to 3D Convolutional Neural Network (3DCNN), and then adopted Grad-CAM algorithms to provide visual explanation from deep networks so as to support the decoding decision.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The 3D scene reconstruction task is the basis for implementing mixed reality, but traditional single-image scene reconstruction algorithms are difficult to generate regularized models. It is believed that this situation is caused by a lack of prior knowledge, so we try to introduce the model collection ShapeNet 1 to solve this problem. Besides, our approach incorporates traditional model generation algorithms. The predicted artificial indoor objects as indicators will match models in ShapeNet. The refined models selected from ShapeNet will then replace the rough ones to produce the final 3D scene. These selected models from the model library will greatly improve the aesthetics of the reconstructed 3D scene. We test our method on the NYU-v2 2 dataset and achieve pleasing results. Our project is publicly available at https://sjtu-cv- 2021.github.io/Single-Image-3D-Reconstruction-Based-On-ShapeNet.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Structure from Motion (SfM) is the cornerstone of 3D reconstruction and visualization of SLAM. Existing deep learning approaches formulate problems by restoring absolute pose ratios from two consecutive frames or predicting a depth map from a single image, both of which are unsuitable problems. In order to solve this maladaptation problem and further tap the potential of neural networks in SfM, this paper proposes a new optimization model for deep motion structure recovery based on recurrent neural networks. The model consists of two architectures based on depth and posture estimation of costs, and is constantly iteratively updated alternately to improve both systems. The neural optimizer designed here tracks historical information during iterations to minimize feature metric cost update depth and camera poses. Experiments show that the optimization model of deep motion structure recovery in this paper is superior to the previous method, effectively reducing the cost of feature-metric, while refining depth and poses.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Autism spectrum disorder is a heterogeneous neurological disorder. The early diagnosis of autism is critical to apply effective treatment. Presently, most diagnoses are based on behavioral observations of symptoms. There has been an increasing number of approaches using magnetic resonance imaging with the development of deep learning in recent years. However, the interfering elements and insignificant differentiation between positive and negative samples have seriously affected the classification performance. In this paper, a multi-scale information fusion mechanism is proposed to combine with attention sub-nets to establish an end-to-end classification model, which selects appropriate fusion strategies for the outputs of different layers of the convolutional neural network to make comprehensive use of the information at different levels of the image. Experiments are conducted by using the dataset of Autistic Brain Imaging Data Exchange. The results show that the proposal achieves better performance than the models in comparison.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Existing 3D face alignment and face reconstruction methods mainly focus on the accuracy of the model. When the existing methods are applied to dynamic videos, the stability and accuracy are significantly reduced. To overcome this problem, we propose a novel regression framework that strikes a balance between accuracy and stability. First, on the basis of lightweight backbone, encoder-decoder structure is used to jointly learn expression details and detailed 3D face from video images to recover shape details and their relationship to facial expression, and dynamic regression of a small number of 3D face parameters, effectively improve the speed and accuracy. Secondly, in order to further improve the stability of face landmarks in video, a jitter loss function of multi-frame image joint learning is proposed to strengthen the correlation between frames and face landmarks in video, and reduce the difference amplitude of face landmarks between adjacent frames to reduce the jitter of face landmarks. Experiments on several challenging datasets verify the effectiveness of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Pointed object detection is of great importance for human-machine interaction, but attempts to solve this task may run into the difficulties of lack of available large scale datasets since people hardly record 3D scenes with a human pointing at specific objects. In efforts to mitigate this gap, we cultivate the first benchmark dataset for this task: PointIt3D (available at https://pan.baidu.com/share/init?surl=E3u96E7dEXnrR1dDris_1w (access code: jps5)), containing 347 scans now and can be easily scaled up to facilitate future utilizations, which is automatically constructed from existing 3D scenes from ScanNet1 and 3D people models using our novel synthetic algorithm that achieves a high acceptable rate of more than 85% according to three experts’ assessments, which hopefully would pave the way for further studies. We also provide a simple yet effective baseline based on anomaly detection and majority voting pointline generation to solve this task based on our dataset, which achieves accuracy of 55.33%, leaving much room for further improvements. Code will be released at https://github.com/XHRlyb/PointIt3D.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Autonomous driving holds the promise of revolutionizing our lives and society. Robot drivers will run errands such as commuting, parking cars, or taking kids to school. It is expected that, by the mid-century, humans will drive only for their pleasure. Autonomous vehicles will increase the efficiency and safety of the transportation system by reducing accidents and increasing the overall system capacity. Current autonomous driving systems are based on supervised learning that relies on massive, labeled data. It takes a lot of time, resources, and manpower to produce such data sets. While this approach is achieving remarkable results, the required effort to produce data becomes a limiting factor for general driving scenarios. This research explores Reinforcement Learning to advance autonomous driving models without labeled data. Reinforcement Learning is a learning paradigm that uses the concept of rewards to autonomously discover, through trial & error, how to solve a task. This work uses the LiDAR sensor as a case study to explore the effectiveness of Reinforcement Learning in interpreting complex data. LiDARs provide a dynamic high time-space definition map of the environment and it could be one of the key sensors for autonomous driving.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Magnetic resonance image (MRI) reconstruction from undersampled k-space data using unsupervised learning methods suffers from insufficient a priori knowledge and the lack of stopping criterion. This work introduces a high-resolution reference image to tackle these issues. Specifically, we explicitly broadcast the reference image into the proposed network, transferring the reference image structure priors to the recovered image. In addition, the reference image helps to develop a criterion to determine the best-reconstructed image, so training stops automatically once the conditions are met. Experimental results show that the proposed method can reduce artifacts without using a priori training set.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recently years, motivated by the excellent performance in automatic feature extraction and complex patterns detecting from raw data, recently, deep learning technologies have been widely used in analyzing fMRI data for Alzheimer’s disease classification. However, most current studies did not take full advantage of the temporal and spatial features of fMRI, which may result in ignoring some important information and influencing classification performance. In this paper, we propose a novel approach based on deep learning to learn temporal and spatial features of 4D fMRI for Alzheimer’s disease classification. This model is composed of 3D Convolutional Neural Network(3DCNN) and recurrent neural network. Experimental results demonstrated that the proposed approach could discriminate Alzheimer’s patients from healthy controls with a high accuracy rate.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Subcortical brain segmentation is a challenging task due to the anatomical variability in both shape and size between patients, such as thalamus, hippocampus and amygdala. It requires the accurate segmentation of these structures to measure their volume and surface. However, few methods can obtain accurate segmentation because the boundaries of these structures are obscure in MR images. We propose an attention-based convolutional neural network for subcortical brain segmentation. In our method, image clipping is firstly applied for pre-processing. Accurate subcortical brain segmentation is obtained by using attention-based convolutional neural network. Maximum connectivity is finally applied for post-processing. Experimental results in 35 subjects showed that the proposed method segment the brain region with higher accuracy than other methods. The Dice, TPR and VD measures show that the proposed method is able to provide a precise and robust segmentation estimate. The proposed method is a suitable alternative to assist the manual subcortical brain segmentation task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Diabetic Retinopathy (DR) is a complication with a high blindness rate caused by diabetes. The diagnosis of DR requires examining the patient's fundus several times a year, which is a heavy burden for a patient and consumes a lot of medical resources. Since soft exudate is an early indicator for detecting the presence of DR, an automated and exact segmentation method for soft exudate is helpful for making a rapid diagnosis. Despite recent advances in medical image processing, the segmentation method of soft exudate is still unsatisfactory due to the limited amount of soft exudate data, imbalanced categories, varying scales and so on. In this work, an improved U-shape neural network (IUNet) was proposed according to the characteristic of soft exudate, which consisted of a contracting path and a symmetric expanding path. Both were composed of convolutional layers, multi-scale modules, and shortcut connections. In training process, a data enhancement strategy was used to generate more training data and a weighted cross-entropy loss function to suppress positive and negative sample imbalance. The proposed method had excellent performance on soft exudate task in Indian Diabetic Retinopathy Image Dataset (IDRiD). The area under precision-recall (AUPR) curve score was 0.711, which was superior to the state-of-the-art models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hyperspectral remote sensing images have been shown to be particularly beneficial for detecting the types of materials in a scene due to their unique spectral properties. This paper proposes a novel semantic segmentation method for hyperspectral image (HSI), which is based on a new spatial-spectral filtering, called extended extrema morphological profiles (EEMPs). Firstly, principal component analysis (PCA) is used as the feature extractor to construct the feature maps by extracting the first informative feature from the hyperspectral image (HSI). Secondly, the extrema morphological profiles (EMPs) are used to extract the spatial-spectral feature from the informative feature maps to construct the EEMPs. Finally, support vector machine (SVM) is utilized to obtain accurate semantic segmentation from the EEMPs. In order to evaluate the semantic segmentation results, the proposed method is tested on a widely used hyperspectral dataset, i.e., Houston dataset, and four metrics, i.e., class accuracy (CA), overall accuracy (OA), average accuracy (AA), and Kappa coefficient, are used to quantitatively measure the segmentation accuracy. The experimental results demonstrate that EEMPs can efficiently achieve good semantic segmentation accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to study the effect of skip connections to segmentation performance in encoder and decoder networks, in this paper, we improve the skip connections of U-Net model and adopt the method of sub-module fusion connection. We fuse the high and low layers of the encoder by multi-head attention. Fusion is performed separately, and the fusion result is connected to the decoder. Considering that different input images have different effects to model training due to factors such as noise, we set the threshold by calculating the Euclidean distance between the image and the mask during training, so that different images use different skip connection methods. Experiments on Cell nuclei, Synapse, Heart, Chaos datasets show that FSC-UNet algorithm this paper proposed has better results than existing algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Diabetic retinopathy is one of the main complications of diabetes and the most important factor leading to blindness in the late stage of the disease. It often manifests as one or more lesions in clinical diagnosis. In order to reduce the difficulty of detection, it is of great significance to segment the optic disc in retinal images. This paper proposes an improved context encoding network architecture (CE-Net) for segmentation of the optic disc portion in diabetic retinal images. The network architecture is divided into three parts: feature encoder module, context extractor module and feature decoder module. The context extractor module consists of an improved dense atrous convolutional block (DAC) and residual multi-kernel pooling (RMP). Experimental result shows that the optimal network model generated by the improved CE-Net architecture has good performance on the Indian Diabetic Retinopathy Image Dataset (IDRID), and compared with other methods, our method has the lowest mean overlap error and the highest accuracy and sensitivity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image segmentation is a classical problem in the field of computer vision. With the extensive development of deep learning, it has achieved much progress in semantic segmentation. However, the mainstream networks used in deep learning such as Fast-SCNN, U-Net, which still face challenges in image segmentation. A common problem is that linear interpolation is used in the up-sampling stage of these networks to obtain high-resolution images. Due to the lack of sufficient feature information, the contours of the objects in the image are blurred and grided. For this reason, we propose a new super-resolution (SR) method to replace the up-sampling with linear interpolation in the network model. Five representative networks integrated with our proposed SR module are used for verification on the CamVid data set. The experimental results show that our method has a 2%~4% improvement in mIoU (the mean value of Intersection over Union) and a 2%~3% improvement in pixel accuracy, which demonstrates its generalization and effectiveness of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to improve the segmentation accuracy of polyp image segmentation under colonoscopy, we propose PVT Dual-Upsampling Net (PDNet). PDNet adopts the encoder network based on Transformer as the backbone network for downsampling, and designs a dual upsampling module based on cascaded fusion network and simple connection network to recover the loss of high-level image features caused by the downsampling process, and obtains a high-level semantic feature map with the same resolution as the input image. The multi-feature fusion module is used to aggregate the low-level feature map and high-level semantic feature map. We validate the model on three publicly available datasets, and our experimental evaluations show that the suggested architecture produces good segmentation results on datasets.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Heart segmentation is challenging due to the poor image contrast of heart in the CT images. Since manual segmentation of the heart is tedious and time-consuming, we propose an attention-based Convolution Neural Network (CNN) for heart segmentation. First, one-hot preprocessing is performed on the multi-tissue CT images. U-Net network with Attention-gate is then applied to obtain the heart region. We compared our method with several CNN methods in terms of dice coefficient. Results show that our method outperforms other methods for segmentation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Existing image inpainting methods have shown promising performance in filling the missing regions with visually plausible contents. However, these methods tend to produce distorted structure and blurry texture. To address these issues, in this paper we propose a two-stage inpainting network that combines texture generation and image completion. In the first stage, a texture generator is used to hallucinate texture of the missing regions to guide the reconstruction in the next stage. In the second stage, considering the texture prior would gradually lose its guiding role with the deepening of the network, we adopt residual texture prior to generate fine details. We also introduce a cross-layer contextual attention module which can not only learn contextual attention in decoder feature map, but also benefit from the similar feature shifted from the encoder, generating reasonable structure and realistic texture. Our comparison results of both qualitative analysis and quantitative experiments on Paris StreetView and CelebA datasets demonstrate our proposed method has better inpainting performance than existing methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Computer imaging methods are widely used in medical related problems. Imaging is readily used for diagnostic purposes due to its availability, non-invasiveness, and high quality. Due to the great number of medical conditions, as well as due to the frequent lack of qualified medical staff, there has been a need to automate the evaluation of radiological examinations. Therefore, a quickly growing branch of science is the neural analysis of medical images. This paper presents the possibility of using computer image analysis and neural modeling methods in the assessment of metric age of children and adolescents from digital pantomographic images. The analog methods used in the clinical assessment of the patient’s chronological age are subjective and characterized by low accuracy. The paper presents the possibility of using RBF networks and deep learning in the assessment of the metric age of children aged from 4 to 15 years. As a result, two neural models with quality ranging from 97 to 99% were obtained.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Underwater image processing and analysis have been a hotspot of study in recent years, as more emphasis has been focused to underwater monitoring and usage of marine resources. Compared with the open environment, underwater image encountered with more complicated conditions such as light abortion, scattering, turbulence, nonuniform illumination and color diffusion. Although considerable advances and enhancement techniques achieved in resolving these issues, they treat low-frequency information equally across the entire channel, which results in limiting the network's representativeness. We propose a deep learning and feature-attention-based end-to-end network (FA-Net) to solve this problem. In particular, we propose a Residual Feature Attention Block (RFAB), containing the channel attention, pixel attention, and residual learning mechanism with long and short skip connections. RFAB allows the network to focus on learning high-frequency information while skipping low-frequency information on multi-hop connections. The channel and pixel attention mechanism considers each channel's different features and the uneven distribution of haze over different pixels in the image. The experimental results shows that the FA-Net propose by us provides higher accuracy, quantitatively and qualitatively and superiority to previous state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Cervical cancer is the second most common malignancy in women, while is prevented through diagnosing and treating cervical precancerous lesions. Clinically, histopathological image analysis is recognized as the gold standard for diagnosis. However, the diagnosis of cervical precancerous lesions is challenging due to the massive size of whole slide images and subjective grading without precise quantification criteria. Most existing computer aided diagnosis approaches are patches-based, first learning patch-wise features and then aggregating these local features to infer the final prediction. Cropping pathology images into patches restrains the contextual information available to those networks, causing failing to learn clinically relevant structural representations. To address the above problems, this paper proposes a novel weakly supervised learning method called general attention network (GANet) for grading cervical precancerous lesions. A bag-of-instances pattern is introduced to overcome the limitation of the high resolution of whole slide images. Moreover, based on two transformer blocks, the proposed model is able to encode the dependencies among bags and instances that are beneficial to capture much more informative contexts, and thus produce more discriminative WSI descriptors. Finally, extensive experiments are conducted on a public cervical histology dataset and the results show that GANet achieves the state-of-the-art performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to solve the insufficiency of training data when deep learning technology is applied to surface defect detection task, a surface defect generation algorithm based on generative adversarial network (GAN) was proposed to enhance training sample data. First, a U-shaped convolutional network was designed, and a spatial adaptive normalized structure was introduced to control the mask image to generate the defect shape, and the network from defect-free image to defect image was completed. Second, a multi-layer convolutional discriminant network is designed to extract adversarial feature of the real samples and generated samples. Finally, the adversarial training loss was designed and the generative network adversarial training was completed. Through quantitative contrast experiment, it is proved that the segmentation network has better segmentation results than without data augmentation after using the surface defect generation algorithm to generate data for data augmentation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In view of the existing high-precision stereo matching based on deep learning which network structure is complex, and it is difficult to deploy and run in real time on edge platform. An improved stereo matching algorithm based on RTStereoNet is proposed. Firstly, the channel attention mechanism is introduced in the matching cost aggregation stage of RTStereoNet, so that the network can adaptively enhance the extraction of effective information and reduce the ambiguity of matching. Secondly, in the disparity refinement stage of RTStereoNet, the color image is introduced to compensate for the loss of details caused by the large-scale downsampling of the network, and a lightweight disparity refinement module is constructed to expand the receptive field of the network. In addition, based on Jetson Xavier NX edge computing module, a special edge computing platform is constructed, with the help of TensorRT inference framework, the calculation support problem of special operators is solved through CUDA programming, and achieved deployment acceleration on the platform for both models before and after the improvement. The results show that after the accelerated deployment, the inference speed of the improved model can reach 30 fps on the KITTI2015 test set, and the improved model has higher accuracy than the original model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Subspace clustering methods have been used for unsupervised learning and Sparse Subspace Clustering (SSC) is one of the most popular methods. Since ℓ1 optimization in SSC requires complex calculation, Orthogonal Matching Pursuit (OMP) is adopted in OMP-SSC to reduce calculation time, but its performance is unsatisfactory. In this paper, a new algorithm, Orthogonal Matching Pursuit with Adaptive Restriction for Sparse Subspace Clustering (OMPAR-SSC) is proposed, in which two adaptive restrictions varying with the strength or density of connections are developed. Our algorithm can improve the connectivity of the affinity graph and enhance the segmentation effect. Experiments on both synthetic data and real-world data also demonstrate that OMPAR-SSC outperforms other subspace clustering algorithms in terms of accuracy and achieves a good trade-off between efficiency and effectiveness.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Remote sensing technology can quickly identify hydro thermal alteration related to mineralization and provide help for prospecting through efficient and accurate analysis. In this study, the Zhaoping fault zone is taken as the research area, Landsat8 OLI remote sensing images with good imaging quality are used as the data source, and ENVI software is used to perform pruning, radiometric calibration, atmospheric correction, and preprocessing of removing interference information including water and vegetation from remote sensing images in 2015 and 2020 respectively. The mineralized alteration information of this area is extracted by CROSTA method, and the extracted thematic information is analyzed and verified. In addition, by comparing the extraction maps of mineralization information in 2015 and 2020, it is found that the situation of over exploitation of mineral resources still exists. The results are in good agreement with the known ore points in the study area, indicating that the extraction of metal mineralization information through remote sensing images is of great significance to the study and rational utilization of mineral resources.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Studies have found autism spectrum disorder is a diffuse developmental disease of the central nervous system. The majority of autism cases result from a combination of genetic predisposition and environmental factors that influence early brain development, despite a few being caused by genes alone. Traditional diagnosis of autism spectrum disorder is usually through interviews and questionnaires, which takes plenty of time and might be misdiagnosed. The primary purpose of this study is to compare different classification methods for distinguishing autism spectrum disorder from typical development by machine learning and deep learning in recent years. The experiments are conducted to discuss their strengths and weaknesses, which, in turn, results are presented for further research.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Image distortion detection is a key step in image quality assessment and image reconstruction algorithms. In previous work, a large number of research focus on detecting the single distortion in the image. However, the number of distortion types in the image is often uncertain. Thus, we propose a model that can be used for hybrid distortion detection. Concretely, we transform the hybrid distortion detection task into a multi-label classification task and abstract it as a convolutional network optimization problem. A dataset is created to train the model and evaluate its performance. Experiments show that the proposed model performs well in the detection of hybrid distortions in images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
It is worth mentioning that in the video sequence modeling, the best recognition architecture is transformer. The current popular transformer based video classification methods focus on the importance of current features in time sequence. The degree of characterization of simultaneous order is insufficient, and simple data augmentation has unstable classification effect. In this paper we proposed a method of non-parametric attention combined with self-supervised feature construction to further improve video classification. In this method, the non-parametric attention mechanism is constructed in the simultaneous order feature to fit the multi-local extreme value distribution. At the same time, in the process of model learning, the input video is randomly masked in temporal domain and spatial domain, and self-supervised information is added to effectively learn the details and classification information of video content. Experiments using kinetics400, kinetics600 and something V2 datasets show that the algorithm in this paper has better improvement in accuracy than the current optimal method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tensor decomposition has been extensively studied for convolutional neural networks (CNN) model compression. However, the direct decomposition of an uncompressed model into low-rank form causes unavoidable approximation error due to the lack of low-rank property of a pre-trained model. In this manuscript, a CNN model compression method using alternating constraint optimization framework (ACOF) is proposed. Firstly, ACOF formulates tensor decomposition-based model compression as a constraint optimization problem with low tensor rank constraints. This optimization problem is then solved systematically in an iterative manner using alternating direction method of multipliers (ADMM). During the alternating process, the uncompressed model gradually exhibits low-rank tensor property, and then the approximation error in low-rank tensor decomposition can be negligible. Finally, a high-performance CNN compression network can be effectively obtained by SGD-based fine-tuning. Extensive experimental results on image classification show that ACOF produces the optimal compressed model with high performance and low computational complexity. Notably, ACOF compresses Resnet56 to 28% without accuracy drop, and the compressed model have 1.14% higher accuracy than learning-compression (LC) method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Along with the prosperity and development of computer vision technologies, fine-grained visual classification (FGVC) has now become an intriguing research field due to its broad application prospects. The major challenges of fine-grained classification are mainly two-fold: localization of discriminative region and extraction of fine-grained features. The attention mechanism is a common choice for current state-of-art (SOTA) methods in the FGVC that can significantly improve the performance of distinguishing among fine-grained categories. The attention module in different designs is utilized to capture the discriminative region, and region-based feature representation encodes subtle inter-class differences. However, the attention mechanism without proper supervision may not learn to provide informative guidance to the discriminative region, thus could be meaningless in the FGVC tasks that lack part annotations. We propose a weakly-supervised attention mechanism that integrates visual explanation methods to address confusing issues in the discriminative region localization caused by the absence of supervision and avoid labor-intensive bounding box/part annotations in the meanwhile. We employ Score-CAM, a novel post-hoc visual explanation method based on class activation mapping, to provide supervision and constrain the attention module. We conduct extensive experiments and show that the proposed method outperforms the current SOTA methods in three fine-grained classification tasks on CUB Birds, FGVC Aircraft, and Stanford Cars.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem that there may be one or more diseases and unbalanced distribution of labels in fundus images, in this paper proposes a multi-label classification method for fundus diseases based on the fusion of meta-data and EB-IRV2 network. Firstly, Efficientnet-B2 and InceptionResNetV2 networks are used to extract feature information from the left and right fundus image data, and then fuse with the meta-data with patient information, finally send them to the classifier for multi-label classification of fundus diseases. Adding patient’s meta-information into the model helps to better capture the lesion information and the location of the lesion in the fundus image, thus improving the accuracy of recognition. The experimental results show that the model in this paper achieves good classification results on the ODIR fundus image database, the accuracy rate is 96.00%, the recall rate is 92.37% and the F1-score is 94.11%, indicating that the proposed model has good robustness in the classification of multi-labeled fundus images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Hyperspectral Image (HSI) classification aims to assign each hyperspectral pixel with an appropriate land-cover category. In recent years, deep learning (DL) has received attention from a growing number of researchers. Hyperspectral image classification methods based on DL have shown admirable performance, but there is still room for improvement in terms of exploratory capabilities in spatial and spectral dimensions. To improve classification accuracy and reduce training samples, we propose a double branch attention network (OCDAN) based on 3-D octave convolution and dense block. Especially, we first use a 3-D octave convolution model and dense block to extract spatial features and spectral features respectively. Furthermore, a spatial attention module and a spectral attention module are implemented to highlight more discriminative information. Then the extracted features are fused for classification. Compared with the state-of-the-art methods, the proposed framework can achieve superior performance on two hyperspectral datasets, especially when the training samples are signally lacking. In addition, ablation experiments are utilized to validate the role of each part of the network.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Grape is an important fruit in the world, there are so many kinds of grapes that we must identify their category for the requirements of agricultural product quality inspection, it is difficult to realize the large-scale grape classification by traditional artificial methods. In the past several years, the accuracy of image classification had been improved due to the application of deep convolution networks, however, the recognition of natural scene fine-grained image is a difficult task in the field of computer vision. Usually, agricultural grapes grow in natural and complex orchard environment, the quality of photographed images will be influenced by illumination, shadow and blur, etc. It is hard to categorize these natural scene subclasses and interclass images due to the similarity and environmental interference of them. We obtain natural scene grape images and divide them into 9-category dataset for the first, and then divide the dataset into 10-category due to the significant difference in recall value of Yongyou-one grape, transfer learning based on Inception-v3 and a number of deep convolutional networks are used to analyze the classification performances of fine-grained image dataset, the size of network model and the number of dataset are analyzed respectively in the work. We obtain 98.494% classification accuracy on 10-category dataset, which is relative improvement to 0.8% on 9-category dataset, and the loss value of 10-category dataset is more stable than that of 9-category.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to fully mine the performance improvement of spatio-temporal features in video action classification, we propose a multi-visual information fusion time sequence prediction network (MI-TPN) which based on the feature aggregation model ActionVLAD. The method includes three parts: multi-visual information fusion, time sequence feature modeling and spatiotemporal feature aggregation. In the multi-visual information fusion, the RGB features and optical flow features are combined, the visual context and action description details are fully considered. In time sequence feature modeling, the temporal relationship is modeled by LSTM to obtain the importance measurement between temporal description features. Finally, in feature aggregation, time step feature and spatiotemporal center attention mechanism are used to aggregate features and projected them into a common feature space. This method obtains good results on three commonly used comparative datasets UCF101, HMDB51 and Something.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Rate-distortion optimized quantization (RDOQ) is an important technique in the video coding standard, which effectively improves encoding efficiency. However, the large compute complexity and the strong data dependency in the RDOQ calculation process limit the real-time encoding in hardware design. In this paper, a fast RDOQ algorithm is proposed, which includes the RDOQ skip algorithm and the optimized rate estimation algorithm. Firstly, by detecting the Pseudo all-zero block (PZB) in advance, some unnecessary RDOQ processes are skipped, thereby reducing the computational complexity. Secondly, by optimizing the elements used in rate estimation of the RDOQ process, the strong data dependency of the process is alleviated, which allows RDOQ to be executed in parallel. Experimental results show that the proposed algorithm reduces 27.6% and 30.6% encoding time with only average 0.3% and 0.1% BD-rate performance loss under low delay P and random access configurations on the HPM-4.0.1 of AVS3, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the explosive growth of the number of online videos, video retrieval becomes increasingly difficult. Multi-modal visual and language understanding based video-text retrieval is one of the mainstream framework to solve this problem. Among them, MMT (Multi-modal Transformer) is a novel and mainstream model. On the language side, BERT (Bidirectional Encoder Representation for Transformers) is used to encode text, where the pretrained BERT will be fine tuned during training. However, there exists a mismatch in this stage. The pre-training tasks of BERT is based on NSP (Next Sentence Prediction) and MLM(masked language model) which have weak correlation with video retrieval. For text encoder will encode text into semantic embeddings. On the visual side, Transformer is used to aggregate multimodal experts of videos. We find that the output of visual transformer is not fully utilized. In this paper, a sentence- BERT model is introduced to substitute BERT model in MMT to improve sentence embeddings efficiency. In addition, a max-pooling layer is adopted after Transformer to improve the utilization efficiency of the output of the model. Experiment results show that the proposed model outperforms MMT.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Moored data buoys are floating platforms at sea. These buoys serve as in-situ Weather, Ocean and Tsunami observatories. These buoys transmit real-time data through 3G/GSM/GPRS and satellite telemetry. Damage to the buoy systems by humans, boats, ships etc., intentional or otherwise, causes loss of data, and inhibits early warning systems. It also has financial implications due to the loss of the instruments, repair & reinstallation charges, and the time a ship spends to fix the buoy. Challenges arise while analyzing the video footage as they are unstable and shaky, due to the continuous movements of floating ocean buoy platforms caused by the state of the sea. This paper explores object detection algorithms for detecting eight different objects commonly found in the camera video footage transmitted by the buoy platforms at sea. The object detection training implementation gave us a best accuracy of 0.867MAP@0.5IOU. The object detection will help in solutions like Object Search, detection of floating marine plastic debris, understanding the direction of motion of ships, boats etc. In a broader perspective, it can help in Surveillance, Market Survey and Fish Detection in underwater cameras for fish abundance study.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The newest Audio Video Coding Standard (AVS3) generation provides better coding efficiency than its predecessor, where two new partitioning structures, i.e., Extend Quad-Tree (EQT) and Binary-Tree (BT), are adopted. Although these split tools bring remarkable coding performance, for the price of increasing of computational coding complexity. For the popular conference video applications, experiments show that the EQT or BT split times in different regions are quite different, which indicates that it is unnecessary to provide all partitioning candidate modes in different area. In this work, an effective partitioning resource allocation method is proposed to reduce computational complexity while guaranteeing the coding performance. Specifically, a Decision Tree (DT) model is trained to determine available partitioning modes for current Coding Unit (CU), where input features are the histogram, sobel texture and average residual difference between current and reference CU, along with the size of CU. The training data are selected from different test sequences of AVS and Joint Video Experts Team Common Test Conditions (JCT) sequences, which are identified by the Structural Similarity (SSIM). The experiments on 720p and Common Intermediate Format (CIF) sequences, implemented on platform of AVS3 reference software HPM-9.1, under Low Delay B (LB) configuration, show the efficiency of the proposed method, which can achieve more than 40.0% computational complexity reduction, and BDBR loss is less than 2.0%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The output video of the optical equipment in the aerospace measurement and control field is prone to the problem of image quality degradation caused by the operator’s unstable manual operation. to improve the classical motion estimation based video stabilization algorithm, a novel video stabilization method based on foreground detection is proposed in this paper. Firstly, a object detection datasets based on historical images of the launch center is collected and labeled. Secondly, inspired by transfer learning and prior knowledge of the image in launch center, a YOLO-based object detection method for rocket launching scene is designed. Then, the object detection method is introduced into the motion estimation based video stabilization pipeline in which the object detection is used for foreground detection so the tracked feature points are filtered to reduce the global motion estimation error caused by the motion of the background area. Thus, the error stabilization problem in the classic motion estimation-based video stabilization method is avoided. Experiments show that the video stabilization method proposed in this paper achieved better image stabilization effect in subject and object evaluation. This paper has certain reference significance for exploring the application of deep learning and artificial intelligence technology in the field of aerospace measurement and control field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Video stabilization is a video enhancement technology that improves the original video quality by eliminating unnecessary camera motion. In the last decade of research, video stabilization has changed from a simple solution aimed at computational simplicity to a complex solution aimed at stabilization effects. We propose a novel method based on Grid-based Motion Statistics(GMS) and warping transformation, stabilizing video with less cropping. Specifically, feature points are firstly matched by GMS, and RANSAC is applied within each frame to estimate the motion vectors accurately. Furthermore, we incorporate predicted adaptive path smoothing to produce stable trajectories and generate stable video with warping transformation. Moreover, to the best of our knowledge, the proposed algorithm has less cropping and better stability than previous work. The experimental results demonstrate the performance of our method on a large variety of consumer videos.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Effective treatment of lung cancer requires accurate diagnosis of mediastinal lymph node metastasis (LNM). In the current clinical practices, invasive examination is considered the gold standard, but it is inefficient and probably causes complications to the patient. Therefore, the automatic diagnosis of LNM from computed tomography (CT) images based on Deep Learning (DL) methods has become important research in aided diagnosis. DL methods require a large number of high-quality data to achieve good results. However, obtaining labels for LNM is difficult, the lack of annotations for LNM limits the accuracy of deep learning network classification. In this paper, we propose a semi-supervised multiple image transformation network (MITNet) for LNM prediction in CT images. We perform multiple image transformations on the images and input them to the feature extractors to extract multi-dimensional features, then use an attention-based module (ABM) to adaptively fuse the features to accurately predict LNM. In addition, in order to solve the problem of insufficient data volume, we introduce a semi-supervised learning strategy to train the network with CT image containing only lymph node (LN) segmentation annotations to improve its generalization ability. Experimental results show that our proposed method has an accuracy of 92.45% and outperforms several state-of-the-art methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We propose a no-reference (NR) stereoscopic video quality assessment (SVQA) model based on Tchebichef moment in this paper. Specifically, we extract keyframes according to mutual information between adjacent frames, and then the extracted keyframes are segmented to patches to calculate low-order Tchebichef moments. Since the strong description ability of Tchebichef moment, and different order of Tchebichef moment can represent independent features with minimal information redundancy, we extract statistical features of Tchebichef moment on computed patches as spatial features. Considering the influence of distortions in spatiotemporal domain to video quality, we use the three-dimensional derivative of Gaussian filters to calculate the spatiotemporal energy responses and extract statistical features from the responses as spatiotemporal features. Finally, we combine the spatial and spatiotemporal features to predict the quality of stereoscopic videos. The proposed model is evaluated on the NAMA3DS1-COSPAD1, SVQA and Waterloo IVC phase I databases. The experimental results show that the proposed model achieved competitive performance as compared with existing SVQA models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Skip/direct mode is one of the inter prediction modes in video coding, which achieves a high coding performance. In Audio and Video coding Standard-3(AVS3), skip/direct has improved more performance with more candidate modes. The candidate mode list is generated by numerous prediction directions with corresponded predicted motion vectors. However, it will result in higher computation complexities and challenges to parallel computation, especially for the hardware implementation. For resolving the problem, we propose a hardware architecture of skip/direct mode with a fast motion vector prediction (MVP) algorithm in this paper. Our architecture is designed with efficient pipeline schedules. And the fast MVP algorithm can reduce the number of MVP candidates efficiently. The fast MVP method is introduced by setting a search window, some unnecessary MVP are skipped, thereby reducing the computational complexity firstly. Then the proposed hardware architecture is given with efficient pipeline schedules in detail. The experimental results show that our architecture is able to meet the requirement of 3840x2160@60FPS with only 0.48% and 0.42% BD-Rate increase under the low delay P (LDP) and random access (RA) configurations, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to effectively solve the problem of real-time distance measurement of traffic signs in intelligent driving environment perception, a distance measurement method based on binocular vision is proposed. In order to solve the problem of real-time distance measurement, the paper proposes to build a correction mapping table, through which the correction coordinates corresponding to any distorted coordinates can be read out. The calibrated parameters are used to calculate the correction mapping table. The coordinates of left and right traffic signs can be obtained through pyramid template matching. Then the parallax is obtained and the distance is measured. The error rate of the measurement method is less than 2.33% within 20 meters to 60 meters. The time of one-time measurement is within 20ms in embedded environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Active speaker detection plays a vital role in human-machine interaction. Recently, a few end-to-end audiovisual frameworks emerged. However, these models' inference time was not explored and are not applicable for real-time applications due to their complexity and large input size. In addition, they explored a similar feature extraction strategy that employs the ConvNet on audio and visual inputs. This work presents a novel two-stream end-to-end framework fusing features extracted from images via VGG-M with raw Mel Frequency Cepstrum Coefficients features extracted from the audio waveform. The network has two BiGRU layers attached to each stream to handle each stream's temporal dynamic before fusion. After fusion, one BiGRU layer is attached to model the joint temporal dynamics. The experiment result on the AVA-ActiveSpeaker dataset indicates that our new feature extraction strategy shows more robustness to noisy signals and better inference time than models that employed ConvNet on both modalities. The proposed model predicts within 44.41 ms, which is fast enough for real-time applications. Our best-performing model attained 88.929% accuracy, nearly the same detection result as state-of-the-art work.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tracking specific objects in images or videos is one of the most attractive problems in visual tasks. It is widely employed in security monitoring, automatic driving, military operations and other scenes. Recently, object tracker based on convolution neural network, especially Siamese network, obtains high accuracy and has been deeply studied. However, in practical application scenarios of visual tracking, when meets clutter background or the object is occluded, the accuracy of the tracking task will drop rapidly, and the tracker loses the target in extreme cases. It is particularly necessary to quickly and accurately relocate the target. Therefore, an anti-interference tracker based on Siamese convolution neural network is developed. Benefiting from the adaptive tracking confidence parameter, once the tracking effect of the tracker has dropped significantly during the tracking process, the location of the object will be corrected immediately. Experimental results show that the proposed method has the ability to relocate and track the target after occlusion or loss effectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Visual relationship detection aims to locate objects in images and recognize the relationships between objects. Traditional methods treat all observed relationships in an image equally, which causes a relatively poor performance in the detection tasks on complex images with abundant visual objects and various relationships. To address this problem, we propose an attention based model, namely AVR, to achieve salient visual relationships based on both local and global context of the relationships. Specifically, AVR recognizes relationships and measures the attention on the relationships in the local context of an input image by fusing the visual features, semantic and spatial information of the relationships. AVR then applies the attention to assign important relationships with larger salient weights for effective information filtering. Furthermore, AVR is integrated with the priori knowledge in the global context of image datasets to improve the precision of relationship prediction, where the context is modeled as a heterogeneous graph to measure the priori probability of relationships based on the random walk algorithm. Comprehensive experiments are conducted to demonstrate the effectiveness of AVR in several real-world image datasets, and the results show that AVR outperforms state-of-the-art visual relationship detection methods significantly by up to 87.5% in terms of recall.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We address the problem of accurately geolocating an image on a large city scale. Image geolocation is the process of distinguishing a place in an image through geotagged reference images depicting the same place. This is a challenging task due to the appearance changes in large outdoor environments. In particular, the limitation on using large geotagged images effectively for training. To overcome this limitation, we propose to select and predict good hybrid features, and cast the prediction score as a classification task. To this end, we generate training features and learn the classifier offline. For the image representation phase, we propose a new method called hybrid feature to make image representation robust against geometric and photometric changes and have a high discriminative level as well. By doing this, we achieve competitive results compared with other baseline methods. Also, our results show a significant improvement while using hybrid features compared to using handcrafted models or deep learning methods individually.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Vision navigation is an alternative to Global Position System (GPS) in environments where access to GPS is denied, and cameras’ pose estimation is the key technology. At present, the pose estimation methods can be divided into two main techniques: feature-based methods and direct methods. In this paper, we theoretically analyzed the basic principles of feature-based methods and direct methods. The Jacobian matrix of cost function with respect to the pose represented by Lie algebra is derived in detail. Then the nonlinear optimization method is utilized to obtain the optimal camera pose. Finally, the accuracy, real-time and robustness of the two methods are compared and analyzed through systematic and comprehensive experiments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Infrared and visual image fusion aims to integrate the salient and complementary features of the infrared image and visual image into one informative image. To achieve this purpose, we have proposed an infrared and visual image fusion method via iterative quadtree decomposition and Bézier interpolation. To be specific, each source image is first decomposed to image patches of multiple sizes in a quadtree structure according to a fixed threshold, then each image patch in the quadtree structure is smoothed by interpolating its four-by-four uniformly distributed pixels with the Bézier interpolation method. With the iteratively smoothed images, multiple scales of bright and dark feature maps of each source image can be gradually extracted from the difference image of every two continuously smoothed images. At last, fusion of the infrared image and visual image can be realized by fusing their multiple scales of bright, dark features and their base images (i.e., final-scale smoothed images). Extensive experiments verify that the proposed method outperforms five state-of-the-art image fusion methods in both qualitative and quantitative evaluations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In the production of power cables, the performance test of the cable insulation sheath is an important part. Compared with traditional testing methods, machine vision has the advantages of stable operation, high precision, and high efficiency. Because of this situation, firstly, based on machine vision theory, the structure of the old-fashioned tensile machine was reconstructed, and the whole tensile test process of the cable insulation sheath test was imaged by a CMOS camera, and the color recognition algorithm, effective area segmentation algorithm, and workpiece were proposed. The fracture judgment detection algorithm and the corrosion difference algorithm are used to calculate the distance between the marked lines and then calculate the elongation at the break of the cable material. Through systematic experiments on the same batch of cable jackets, the deviation of the elongation at break measured by visual inspection is the largest, no more than 1%. The experimental results and practical applications show that the machine vision-based visual inspection system has higher accuracy, faster efficiency, and more stable and reliable operation than the traditional inspection system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to study the effect of flight altitude on the radiation characteristics of engine lateral jet, based on the simulation results of three-dimensional flow field, radiation transfer equation and molecular spectral line parameter database, we applied the apparent light line of sight method to solve the lateral jet radiation transfer equation and established a procedure to calculate the infrared radiation characteristics of the lateral jet of the attitude control engine; Correct the spectral line intensity of gases at high temperature and pressure. Using the spectral band model to calculate the spectrum absorption coefficients. The infrared radiation characteristics of the lateral jet of the attitude control engine at different flight altitudes are studied, and the distribution of the infrared radiation brightness of the lateral jet in different bands is obtained. The lateral jet spectral irradiance of the attitude control engine decreases with the increase of flight altitude in the low altitude environment, and increases with the increase of flight altitude in the high altitude environment. The results show that the program can simulate the infrared radiation characteristics of the lateral jet of the attitude control engine well and is widely applicable; Different flight altitudes affect the infrared radiation characteristics of the lateral jet of the engine to a certain extent and the flight altitudes at low and high altitudes have different effects on the radiation characteristics of the lateral jet of the attitude control engine.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the problem that the current infrared and visible image fusion based on deep learning has no labels, this paper proposes an infrared and visible image fusion algorithm based on unsupervised learning. This method utilizes the characteristics of unsupervised learning, and introduces infrared image information with high gray value into the visible image to obtain the fusion image. The deep learning network proposed in this paper is composed of 6 layers of convolution blocks, and a dual attention module is also designed to make the fusion image pay more attention to the high gray value area in the infrared image. By introducing skip connections, the shallow features are fused with the deep features, so that the details of the entire fused image are richer and the appearance of halos is reduced. A large number of experimental results show that the fusion method proposed in this paper can accurately highlight the target object while maintaining the visible texture details, enhance the visual effect of the human eye, and improve the target recognition. At the same time, the quantitative experimental results show that the fusion algorithm proposed in this paper has obvious advantages in multiple indicators.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
High-speed aircraft is very important for space safety and scientific exploration. When aircraft flies in near space at high speed, thin atmosphere outside aircraft forms high-speed convection. For optical detection device, the convection will cause aero optical effect, which can seriously affect the detection range and sensitivity of remote sensing system. In order to discuss the impact of aero optical effect for infrared detection device, we study the formation mechanism of aero optical effect and analyze the model of radiation transmission. Through static window heating test and dynamic flow field wind tunnel test, we verify the thermal radiation influence of quartz window and high-speed flow filed for short wave infrared detection system. The experimental results show that for 900nm-1700nm short wave infrared imaging system, short wave infrared signal of target can filter through quartz window below its melting point temperature. By accurately controlling exposure time, the thermal radiation effect of high-speed flow field can be weakened, and the target contrast can be improved for infrared detection system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The spectrogram can clearly show the composition of different frequencies in the speech signal. In this paper, a speech enhancement method based on deep learning image processing is proposed, which optimizes the spectrogram of the laser detected speech signal to achieve speech enhancement. The laser beam emitted by the laser Doppler vibrometer (LDV) is focused on the glass window to detect the vibration caused by sound wave. After conversion, the audio information that causes vibration is obtained. Under the interference of speckle noise and air disturbance, the detected speech signal not only has a low signal-to-noise ratio (SNR) but also has non-stationary noise. In order to overcome the difficulty that traditional methods are difficult to extract weak signals in the case of severe noise interference, we use deep learning to achieve spectrogram noise reduction and speech information enhancement. By processing the spectrogram of noisy speech with the generative adversarial networks (GAN) combined with the spatial attention mechanism and introducing the short-time objective intelligibility (STOI) into the loss function, the laser detected speech signal was successfully enhanced.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
LiDAR based Simultaneous Localization and Mapping (LiDAR SLAM) plays a vital role in autonomous driving and has attracted the attention of researchers. In order to achieve higher accuracy of motion estimation between adjacent LiDAR frames and reconstruction of the map, a segmentation-based LiDAR odometry and mapping framework is proposed in this paper. In detail, we first define the classification of several features with weak semantic information, the extraction method of which is achieved by a segmentation algorithm proposed in this paper that is based on greedy search. Based on the above work, a novel point cloud registration algorithm is also proposed in this paper, which is solved by modeling the problem as a nonlinear optimization problem. In order to verify the effectiveness of the proposed model, we collect a large amount of data in the autonomous driving test area to test it and compare the results with the existing state-of-the-art models. The experimental results show that the algorithm proposed in this paper can run stably in real-world autonomous driving scenarios and has smaller error and higher robustness compared with other models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of deep neural networks (DNN), building visual decoding models based on functional magnetic resonance imaging (fMRI) to simulate the visual system of the human brain and studying visual mechanisms have become a research hotspot. Although existing visual decoding models built using DNNs have achieved a certain accuracy, most models ignore the differences between different voxels. Among them, the BRNN-based category decoding model uses the bidirectional long short term memory (LSTM) network to simulate the visual bidirectional information flow, which improves the decoding accuracy, but it uses the voxels of each brain area as an overall input model. Therefore, we embed the channel attention module, the Squeeze-and-Excitation Networks (SENet), into the LSTM network to construct an LSTM-SENet vision that introduces an attention mechanism The decoding model allows the model to learn by itself and assign different weights to each voxel, focusing on important voxels, thereby improving the classification accuracy of natural images. The experimental results show that our method improves the accuracy of (three-level) category decoding than other methods, and the results further verify the effectiveness of building a visual decoding model based on the visual mechanism.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Most of existing data-driven temperature imaging schemes for Tunable Diode Laser Absorption Spectroscopy (TDLAS) tomography are based on Convolutional Neural Network (CNN). However, some studies on CNN show that its actual perceptual field is much smaller than the theoretical one, which makes it not conducive for CNN to capture features from contextual information at long distance. In this work, a temperature imaging network based on Swin Transformer is established. To introduce cross-window connections while maintaining the efficient computation of local non-overlapped windows, Multi-headed Self-Attention (MSA) is computed alternatively in regularly partitioned windows and shifted windows. Simulation results show that the proposed network can reconstruct temperature images of higher quality than schemes based on CNN and Extreme Learning Machine (ELM) respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Brain structure segmentation from 3D magnetic resonance (MR) images is a prerequisite for quantifying brain morphology. Since typical 3D whole brain deep learning models demand large GPU memory, 3D image patch-based deep learning methods are favored for their GPU memory efficiency. However, existing 3D image patch-based methods are not well equipped to capture spatial and anatomical contextual information that is necessary for accurate brain structure segmentation. To overcome this limitation, we develop a spatial and anatomical context-aware network to integrate spatial and anatomical contextual information for accurate brain structure segmentation from MR images. Particularly, a spatial attention block is adopted to encode spatial context information of the 3D patches, an anatomical attention block is adopted to aggregate image information across channels of the 3D patches, and finally the spatial and anatomical attention blocks are adaptively fused by an element-wise convolution operation. Moreover, an online patch sampling strategy is utilized to train a deep neural network with all available patches of the training MR images, facilitating accurate segmentation of brain structures. Ablation and comparison results have demonstrated that our method is capable of achieving promising segmentation performance, better than state-of-the-art alternative methods by 3.30% in terms of Dice scores.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Generally, the fused image shows fully the actual situation of the scene and contains more detailed information. However, most fusion methods miss the details of the fusion image and confuses the contrast of the raw scene. To solve this problem, we propose a fusion algorithm based on non-subsampled shearlet transform (NSST) that particularly pays attention to the influence of light intensity when calculating the fusion coefficient. The method first decomposes the input images into high- and low-frequency coefficients through NSST. Then regarding the high-frequency coefficients, we calculate the phase consistency (PC) of the decomposed images, and the results are combined with the adaptive simplified pulse coupled neural network (SPCNN) to compose parameter. Meanwhile, for the low-frequency coefficient, the optimal brightness entropy (OBE) of the input images is obtained as the fusion basis. The next step is to fuse the high- and low-frequency sub-band coefficients by the designed fusion rule, and obtain final image through NSST inverse transformation. Experiments show that our method not only keeps well the image details and maintains the overall image luminance while taking care of the overall effect of the image, but also gets a leading position in some evaluation indicators.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Inspired by the state-of-the-art performance of MLP-Mixer on image classification task, multilayer perceptron (MLP) model attracts a number of researcher's attention again. Although various MLP architectures have been proposed, most of them focus on image classification domain. In this paper, we extend MLP to the task of image generation based on generative adversarial network (GAN). The pure MLP model is not friendly to small dataset because it is a data-hungry architecture. Thus, we leverage a hybrid model to solve the problem which uses MLP blocks as generator and CNN blocks as discriminator. Experimental results demonstrate that our model outperforms the pure CNN network on CIFAR- 10 dataset in view of two different evaluation metrics. Besides, we apply eight widely popular MLP structures to our generator to validate which one achieves the most excellent performance in image generation task.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Infrared images and visible images can obtain different image information in the same scene, especially in low-light scenes, infrared images can obtain image information that cannot be obtained by visible images. In order to obtain more useful information in the environment such as glimmer, infrared and visible images can be fused. In this paper, an image fusion method based on anisotropic diffusion and fast guided filter is proposed. Firstly, the source images are decomposed into base layers and detail layers by anisotropic dispersion. Secondly, the visible images and the infrared images are passed through the side window Gaussian filter to obtain the saliency map, and then the saliency map is passed through fast guided filter to obtain the fusion weight. Thirdly, the fused base layers and the fused detail layers are reconstructed to obtain the final fusion image. The application of the side window Gaussian filter helps to reduce the artifact information of the fused image. The results of the proposed algorithm are compared with similar algorithms. The fusion results reveal that the proposed method are outstanding in subjective evaluation and objective evaluation, and are better than other algorithms in standard deviation(STD) and entropy(EN), and other quality metrics are close to the optimal comparison algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
This paper proposes a new end-to-end no-reference (NR) video quality assessment (VQA) algorithm that makes use of dimensionality reduction and attention-based pooling. Firstly, the dataset is expanded through data enhancement based on frame sampling. Secondly, the cropped video blocks are input into the trainable data dimensionality reduction module which adopts 3D convolution to reduce the dimension of the data. Then, the dimensionality reduced data is input into the backbone of the algorithm to extract spatial features. The extracted features are pooled through attention-based pooling. Finally, the pooled features are regressed to the quality score through the full connection layer. Experimental results show that the proposed algorithm has achieved competitive performance on the LIVE, LIVE Mobile and CVD2014 datasets, and has low complexity.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
A new zero-watermarking algorithm based on deep learning is proposed to improve the robustness of the zero-watermarking, in which zero-watermarking image generation and copyright verification are both completed using neural networks. First, a stylized image is generated from a host image and a logo image with a time stamp through VGG network. Then, the stylized image is encrypted by the Arnold transform and registered as a zero-watermarking image in Intellectual Property Protection (IPR). Finally, the RCNN network is designed to extract the logo image to verify the copyright of host images. The experimental results show that the security and robustness of the algorithm are better than the existing zero-watermarking algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Transformer has achieved milestones in natural language processing (NLP). Due to its excellent global and remote semantic information interaction performance, it has gradually been applied in vision tasks. In this paper, we propose PTIQ, which is a pure Transformer structure for Image Quality Assessment. Specifically, we use Swin Transformer Blocks as backbone to extract image features. The extracted feature vectors after extra state embedding and position embedding are fed into the original transformer encoder. Then, the output is passed to the MLP head to predict quality score. Experimental results demonstrate that the proposed architecture achieves outstanding performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Super-resolution algorithms aim to produce magnified high-resolution versions from low-resolution images. Some methods, however, are prone to generate blur during the process. Simple sharpening filters are adopted to alleviate this type of artifact. However, the actual effectiveness of this approach is not clear-cut in the literature. This work evaluates the effect of three simple sharpening filters on the quality of images obtained from super-resolution methods. Two metrics were considered in the evaluation: the Peak signal-to-noise ratio (PSNR) metric, and the Learned Perceptual Image Patch Similarity (LPIPS). One of the filters could consistently improve the LPIPS metric of magnified images from diverse benchmark sets on top of seven super-resolution methods. The increments obtained for the perceptual metric seem to occur due to the sharpening effect. Improvements on PSNR values were not as consistent.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Traditional image steganography often utilizes modifying the pixel values of the carrier image to embed secret image information. However, once the pixel value of the carrier image is modified, it will leave a trace of modification, which is easy to be discovered by a third party, thereby destroying the secret image information and causing the failure of the secret image transmission. This paper proposes a coverless image steganography algorithm based on style transfer, which uses the style of secret images to generate camouflage images and transmit them on public channels. This method does not need to modify the pixel values of carrier images, and the method can resist common attacks. Our method consists of two parts: sender and receiver. In the sending stage, we first use the style transfer technique to combine the style of the secret image with the content of the natural image to generate a camouflage image, and in the receiver stage we design a convolutional neural network (CNNSI) to extract the secret image. The training data set of CNNSI network consists of camouflage images subjected to various attacks. Experimental results show that, compared with the existing methods, the proposed method can still extract the secret image after the camouflage image is attacked. This method has better robustness and security.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In recent years one can observe a continuous growth of demand for dried vegetables. This tendency has an impact on the development of dehydrated food market segment in Poland, which enables to manage a surplus of vegetable production. More and more often dried vegetables are used in various sectors of food industry, both because of their high nutritional values and because of the changing nutritional habits among customers. Among dried vegetables, dried carrots seem to play a strategic role on account of the fact that this produce has a wide spectrum of applications and is famous for its high nutritional value. The research was conducted in order to evaluate the quality of dried carrot cubes using three different techniques of drying. It should be noted that during the research both correct and incorrect dried carrots were used. What is more, the process of deep learning of convolutional artificial neural networks was carried out with MobileNet architecture for classification, for a selected research sample. The classification included both the type of drying process and the quality of drying for binary division on account of the applied parameters. The obtained models were characterized by high capability to classify samples at the level of 85 - 100% with the exception of lyophilized dried carrots where the trained network reached the effectiveness of classification at the level of 71% for the validation set. The research proved that fast and noninvasive evaluation of the quality of dried carrot cubes in different conditions is possible and highly effective using artificial neural networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Although an attention mechanism is reasonable for generating image captions, how to obtain ideal image regions within the mechanism is a problem in practice due to the difficulty of its calculation between image and text data. In order to improve the attention modules for image captioning, we propose an algorithm for handling a pixel-wise semantic information, which is obtained as the outputs of semantic segmentation. The proposed method puts the pixel-wise semantic information into the attention modules for image captioning together with input text data and image features. We conducted evaluation experiments and confirmed that our method could obtain more reasonable weighted image features and better image captions with a BLEU-4 score of 0.306 than its original attention model with a BLEU-4 score of 0.243.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the continuous development of multimedia technology and shooting hardware, more and more high-quality images appear in life and production. At present, there are many image encryption methods to ensure image security, but most of them cannot meet the real-time requirements of large-size image encryption, which causes great obstacles to the application and popularization of image encryption. In this paper, an efficient image encryption method based on variable row-columns scrambling and dynamic threshold selective block diffusion is designed. The pixels are operated in batches row by column and block, and the length of chaotic sequence required is reduced without reducing the security, thus reducing the time consuming of the encryption system. In addition, the modified five-point sampling is adopted to select chaotic sequence, which improves the utilization rate of chaotic sequence and further improves the encryption efficiency. Experimental results show that the proposed method has a higher encryption efficiency than the existing methods with similar security, and has strong practicability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Light field imaging can record spatial and angular information of scenes simultaneously, which can provide images focused at different depths by computational imaging. However, the number of sensor pixels and the size of the microlens array limit the resolution of refocused images, which makes them difficult to be used for downstream tasks. To overcome this limitation, we propose a self-supervised super-resolution algorithm to increase the resolution of refocused images, which relies only on the image prior information. With the prior information of low-resolution refocused images and convolutional structure, we can not only significantly improve image quality, but also solve the problem of insufficient training data. Intensive experiments show that the proposed self-supervised approach is able to obtain impressive results and is even comparable to the data-hungry supervised learning methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the development of privacy protection, reversible data hiding methods in encrypted image have drawn extensive research interest. Among them, a new method is proposed based on embedding prediction errors, i.e., EPE-based method, where secret information is embedded in the encrypted most significant bit plane. Not only the original image can be recovered with high quality but also the payload can reach close to 1 bit per pixel (bpp). However, there are potential errors in the process of extracting secret data, because most significant bits of a part of pixels are used as flags to mark prediction error location. In this paper, a reversible data hiding method in encrypted image with high capacity is proposed by combining most significant bit prediction with least significant bit compression. At first, most significant bit of each pixel is predicted and a location map of prediction errors in the original image is generated. In the same time, the original image is encrypted using a stream cipher method. Then, the location map is embedded into the vacated space generated with compressing least significant bits and the secret data is embedded into most significant bits of a part of pixels without prediction errors. In this way, the marked encrypted image is obtained. Finally, the original image can be recovered without any error and the secret information can be extracted correctly from the marked encrypted image. Experimental results show that the proposed method has better performance than EPE-based and other methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
As the primary method for real-time image processing, a field-programmable gate array (FPGA) is widely used in binocular vision systems. Distortion correction is an important component of binocular stereo vision systems. When implementing a real-time image distortion correction algorithm on FPGA, problems, such as insufficient on-chip storage space and high complexity of coordinate correction calculation methods, occur. These problems are analyzed in detail in this study. On the basis of the reverse mapping method, a distortion correction algorithm that uses a lookup table (LUT) is proposed. A compression with restoration method is established for this LUT to reduce space occupation. The corresponding cache method of LUT and the image data are designed. The algorithm is verified on our binocular stereo vision system based on Xilinx Zynq-7020. The experiments show that the proposed algorithm can achieve real-time and high precision gray image distortion correction effect and significantly reduce the consumption of on-chip resources. Enough to meet the requirements of accurate binocular stereo vision system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Haze removal is a challenging task in image recovery, because hazy images are always degraded by turbid media in atmosphere, showing limited visibility and low contrast. Analysis Sparse Representation (ASR) and Synthesis Sparse Representation (SSR) has been widely used to recover degraded images. But there are always unexpected noise and details loss in the recovered images, as they take relatively less account of the images’ inherent coherence between image patches. Thus, in this paper, we propose a new haze removal method based on hybrid convolutional sparse representation, with consideration of the adjacent relationship by convolution and superposition. To integrate optical model into a convolutional sparse framework, we separate transmission map by transforming it into logarithm domain. And then a structure-based constraint on transmission map is proposed to maintain piece-wise smoothness and reduce the influence brought by pseudo depth abrupt edges. Experiment results demonstrate that the proposed method can restore fine structure of hazy images and suppress boosted noise.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the rapid development of 3D capture technologies, point cloud has been widely used in many emerging applications such as augmented reality, autonomous driving, and 3D printing. However, point cloud, used to represent real world objects in these applications, may contain millions of points, which results in huge data volume. Therefore, efficient compression algorithms are essential for point cloud when it comes to storage and real-time transmission issues. Specially, the attribute compression of point cloud is still challenging owing to the sparsity and irregular distribution of corresponding points in 3D space. In this paper, we present a novel point cloud attribute compression scheme based on inter-prediction of blocks and graph Laplacian transforms for attributes residual. Firstly, we divide the entire point cloud into adaptive sub-clouds via K-means based on the geometry to acquire sub-clouds, which enables efficient representation with less cost. Secondly, the sub-clouds are divided into two parts, one is the attribute means of the sub clouds, another is the attribute residual by removing the means. For the attribute means, we use inter-prediction between sub-clouds to remove the attribute redundancy, and the attribute residual is encoded after graph Fourier transforming. Experimental results demonstrate that the proposed scheme is much more efficient than traditional attribute compression schemes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Artificial intelligence (AI) and its application are developed explosively not only in control field but also in signal and information processing field. Fuzzy theory is an important branch of AI. In fuzzy enhancement theory of image processing, Pal function is often employed as the membership function. Although this function possesses good filtering effect, the fuzzy factors of the function are often empirical values, which results to different image enhancement effects when the input images are different, and details of the enhancement image are not clear, then bad enhancement effect always appears. In this paper, the fuzzy factors are considered as variables. At the same time, an evaluation function is constructed to evaluate the enhancement performance, and a suitable optimization algorithm is used to obtain the most optimum values of the fuzzy factors automatically. Simulation results show good performance of the improved method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the impact of Corona Virus Disease 2019 (COVID-19), facial mask has become a necessary protective measure for people going out in the last two years. One's mouth and nose are covered to suppress the spread of the virus, which brings a huge challenge for face verification. Whereas some existing image inpainting methods cannot repair the covered area well, which reduces the accuracy of face verification. In this paper, an algorithm is proposed to repair the area covered by facial mask to restore the identity information for face authentication. The proposed algorithm consists of an image inpainting network and a face verification network. Among them, in image inpainting network, to begin with, two discriminators, namely global discriminator and local discriminator. Then Resnet blocks are employed in two discriminators, which is used to retain more feature information. Experimental results show that the proposed method generates fewer artifacts and receives the higher Rank-1 accuracy than other methods in discussion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Multivariate time series data is ubiquitous in the real world, and the study of its modeling and analysis is a popular research topic in meteorology, transportation, finance and other fields. In these studies, classical statistical methods are primarily aimed at single time series analysis, while deep learning demonstrates the power to mine patterns from massive amounts of data. A major application of these studies is to analyze collected historical sequence information to predict what will happen over time in the future. Currently, recurrent neural network-based models and time-convolution-based models realize the predictive power of multivariate time series, but these deep models perform mediocrely at predicting long-sequence tasks. On the one hand, due to the accumulation of errors, on the other hand, the fact that the collected sequence contains a large amount of high-frequency noise. In order to improve the prediction accuracy of the model and mine more valuable features from the series, we propose a novel multivariate time series prediction framework ADWT for time series modeling. By designing an adaptive filtering module in the characteristics of the signal frequency domain, our model removes noise from some of the time series and builds an end-to-end framework by fusing it with the prediction module of deep learning. Experimental results show that our model can effectively improve the prediction accuracy of multivariate time series, and its performance in the three benchmark data sets is competitive with the latest spatial-temporal series prediction model, and has good interpretability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Aiming at the difficulty of the target track initiation with bearing-only and no distance information under the complex background, an improved heuristic track initiation algorithm is proposed. Based on the motion characteristics of the target in the azimuth-pitch coordinate system, the motion trajectory of the target and clutter distribution are modeled. Combining the heuristic track initiation algorithm with Kalman filtering can build target track rapidly under complex background and ensure a higher probability of truth track. This method meets the full performance of the heuristic track initiation algorithm, and has a high probability of correct track in the case of multiple clutter numbers or high clutter density. Finally, the simulation is performed to verify the effectiveness of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Facial makeup transfer can realize automatic application of any makeup styles on the target face without the change of face identity. BeautyGAN enables unsupervised makeup transfer, but there are several problems with generated images, that is, partial loss of makeup effect, poor performance in makeup transfer while the input images or backgrounds are complex, and difficulty in transferring low-resolution images directly. To solve these problems, BeautyGAN, an existing makeup transfer model, was optimized. Referring to the fast style transfer algorithm, a BeautyGAN-based makeup transfer model was designed and developed by introducing a perceptual loss model to improve the performance of BeautyGAN in extracting facial features. The input image is preprocessed by SRGAN network to adapt low-resolution images to BeautyGAN model. The results show that the optimized BeautyGAN has improved local migration performance and can be put into real time operation during testing. Compared with BeautyGAN, the effect of makeup transfer has been significantly improved on the input images with facial expressions, facial occlusion or small angle pose. It is also compatible with low-resolution images.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The quality of solar panels determines the efficiency of photovoltaic power generation. With the rapid development of the photovoltaic industry, the quality of solar panels has gradually become the focus of the industry. The failure of solar panels limits the photoelectric conversion efficiency and service life of the panels, and poses a huge challenge to the overall safety of the photovoltaic system. Therefore, this article proposes a solar panel fault diagnosis method based on the YOLOv3 algorithm. The algorithm optimizes the learning rate configuration, the determination of the optimal anchor frame, and the avoidance of identifying multiple anchor frame parts on the basis of the YOLOv3 algorithm. And it can detect many different types of target failure points at the same time. The experimental results verify the effectiveness of the algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
For the problem of real-time indoor localization of workers in factory workshops and corridors, the pre-trained YOLOv3 detection model based on deep learning network is used to realize the visual localization of unmarked dynamic targets by monocular cameras. This method only needs a fixed-position camera in the measured area to complete real-time detection and localization of moving targets in the measured area. The algorithm is verified by simulation and experiment, and the personnel localization error of 8.2cm on the X axis and 19.57cm on the Y axis is obtained. Compared with other localization methods, it has the advantages of relatively low hardware cost, simple system setup, high algorithm portability, good practicability and industrial promotion value.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Xanthoceras sorbifolium bunge is a kind of edible oil tree in China, which has very high economic value, but the timely picking of mature fruits is a problem that has troubled farmers for a long time. To rapidly, automatically and accurately identify mature Xanthoceras sorbifolium bunge in the field, a mobile data acquisition and transmission system was firstly designed based on the architecture of the Internet of Things, which provides image acquisition and positioning tools for timely and accurate picking of Xanthoceras sorbifolium bunge. Secondly, a mature Xanthoceras sorbifolium bunge identification network model was constructed based on the lightweight efficient model YOLOv3 by using convolutional neural network (CNN) and flip residual network. The established optimal identification model was evaluated, the results of which indicate that the constructed optimal model can serve as a tool to identify the maturity of Xanthoceras sorbifolium bunge with the mAP of 97.04%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Compared with point features, line features in the environment have more structural information. When indoor texture is not rich, making full use of the structural information of line features can improve the robustness and accuracy of simultaneous location and mapping algorithm. In this paper, we propose an improved monocular inertial indoor location algorithm considering point and line features. Firstly, the point features and line features in the environment are extracted, matched and parameterized, and then the inertial sensor is used to estimate the initial pose, and the tightly coupled method is adopted to optimize the observation error of the point and line features and the measurement error of the inertial sensor simultaneously in the back optimization to achieve accurate estimation of the pose of unmanned aerial vehicle. Finally, loop closure detection and pose graph optimization are used to optimize the pose in real time. The test results on public datasets show that the location accuracy of the proposed method is superior to 10 cm under sufficient light and texture conditions. The angle measurement accuracy is better than 0.05 rad, and the output frequency of positioning results is 10Hz, which effectively improves the accuracy of traditional visual inertial location method and meets the requirements of real-time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Early diagnosis and regular monitoring of osteoporosis is key to prevent further deterioration and fractures in osteoporosis patients. Dual-energy X-ray Absorptiometry (DXA), despite being a gold standard for diagnosing osteoporosis, is not routinely ordered due to limited availability of DXA machine, especially in developing countries. As a result, orthopedists often lack DXA results at the time of examination. This study aims to develop an automated AI system to predict osteoporosis based on a plain x-ray scan of patient’s femur and demographic data, such as age, height and weight. The system first performs instance segmentation on the X-ray scan to locate femur, followed by image processing techniques to measure the inner and outer diameter of the femur, and then compute cortical thickness index (CTI). The CTI value, together with patient’s demographic data, is incorporated into a classification model to predict if the patient is suffering from osteoporosis. We found that the CTI calculated by the AI system is comparable to the manually calculated CTI. The AI system can predict at an accuracy of 85.3% using CTI and patient data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
How to solve the scale variation and background interference faced by crowd counting algorithms in practical applications is still an open problem. In this paper, to tackle the above problems, we propose the Attention-guided Feature Fusion Network (AFFNet) to learn the mapping between the crowd image and density map. In this network, the Channel-attentive Receptive Field Block (CRFB) is constructed by parallel convolutional layers with different expansion rates to extract multi-scale features. By adopting attention masks generated by high-level features to adjust low-level features, the Feature Fusion Module (FFM) can alleviate the background interference problem at the feature level. In addition, the Double Branch Module (DBM) generates a density estimation map, which further erases the background interference problem at the density level. Extensive experiments conducted on several challenging benchmark datasets including ShanghaiTech, UCF-QNRF and JHU-CROWD++ demonstrate our proposed method is superior to the state-of-the-art approaches.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
We present Deformable Voxel Grids (DVGs) for 3D shapes comparison and processing. It consists of a voxel grid which is deformed to approximate the silhouette of a shape, via energy-minimization. By interpreting the DVG as a local coordinates system, it provides a better embedding space than a regular voxel grid, since it is adapted to the geometry of the shape. It also allows to deform the shape by moving the control points of the DVG, in a similar manner to the Free Form Deformation, but with easier interpretability of the control points positions. After proposing a computation scheme of the energies compatible with meshes and pointclouds, we demonstrate the use of DVGs in a variety of applications: correspondences via cubification, style transfer, shape retrieval and PCA deformations. The first two require no learning and can be readily run on any shapes in a matter of minutes on modest hardware. As for the last two, they require to first optimize DVGs on a collection of shapes, which amounts to a pre-processing step. Then, determining PCA coordinates is straightforward and brings a few parameters to deform a shape.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
When projecting onto a non-white surface, the projected image is distorted or color mixing by complex luminance and chrominance information, which makes the projection result different from the visual perception of the human eye. The purpose of projection image correction is to remove these effects, and traditional solutions usually estimate parameters from the collected projection samples, compute an inverse model of the projection imaging process, and try to fit a correction function. In this paper, a deep neural network-based projection image correction network (PICN) is designed to implicitly learn complex correction functions. PICN consists of a U-shaped backbone network, a convolutional neural network that extracts projected surface features, and a perceptual loss network that optimizes the correction results. Such a structure can not only extract the deep features and surface interference features of the projected image, but also make the corrected projected image more in line with human visual perception. In addition, we built a projector-camera system under the condition of a fixed global illumination environment for verification experiment, and proved the effectiveness of the proposed method by calculating the evaluation metrics of projected images before and after correction.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the explosive advancements in image display technology, recapturing high-quality images from high-fidelity LCD screens becomes more and more easy. Such recaptured images can not only be used for deceiving intelligent recognition systems but also for hiding tampering traces. In order to prevent such a security loophole, we propose a recaptured image detection approach based on deep hybrid correlation network. Specifically, we first design a deep hybrid correlation module to extract the correlations in different color channels and neighboring pixels. This module has three different branches, in which a 1×1 convolution layer is used to learn the correlations between color channels while two consecutive convolution sub-modules are used to extract the correlations between neighboring pixels. Then we feed the output of this module into consecutive convolution modules to further learn the hierarchical representation for make decision. Ablation experiments verify the effectiveness of our proposed deep hybrid correlation module, while single database experiments demonstrate that our proposed method can achieve average accuracy with about 99% on three public databases. Specifically, our method not only performs very close to the state-of-the-art methods on the most difficult-to-detect ICL-COMMSP database and the relative low-quality NTU-ROSE database, but also improves the performance on the most diverse Dartmouth database obviously, which verifies the effectiveness of the proposed deep architecture. Besides, mixed database experiments verify the superiority of the generalization ability of our proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, the idea of regression model is adopted to complete the fusion of multi-focus images through an end-to-end generative adversarial network (GAN). In the generator part, image features are extracted through multi-branch connection and dense connection technology. In the process of extracting high-dimensional image features, the ECA module is embedded to improve the capability of network. In the discriminator part, the idea of relative GAN is used to predict the relative authenticity between images. Due to the idea and reasonable network construction, the method proposed in this paper can obtain good results of image fusion. And the experimental results demonstrate that the one can also obtain fine results in objective evaluation, which is better than the compared algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
The issue of displacement measurement of floating wind turbine model when experiencing wind-wave loads is one important assessment indicator of stability, but conventional contact methods hardly to utilize in environment which are complex and have limited accessibility. Consequently, we develop the high-speed videogrammetry technique for displacement measurement from tremendous image sequences data, which possesses advantages of non-contact, high frame rate, three-dimensional dense measurement when compare to traditional approaches. The first, the motion of model in image sequences are acquired with the accurate target recognition and tracking algorithms. The second, the cameras interior and exterior parameters are determined through camera calibration and bundle adjustment respectively. The third, the target spatial position and displacement in X, Y, and Z directions are finally calculated based on videogrammetry theory. The experiment results demonstrate that the spatial position calculated by proposed approach can reach submillimeter accuracy in three directions when compare to high accuracy total station. In addition, the credibility of displacement results is further proved by the frequency consistence between measured and theoretical.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In order to make full use of multi-source remote sensing image resources in the application of remote sensing image change detection, a parallel computing for coarse-grained computing data is a very important choice. For the large-scale explosive growth of remote sensing image change which is contained coherent speckles detection sensor data, it’s the amount of data calculation is large and time-consuming. With the development of modern remote sensing big data and Artificial Intelligence (AI), however, it is difficult for traditional methods to solve an efficient algorithm for change detection in synthetic aperture radar application. In connection with acquiring SAR (Polarimetric Synthetic Aperture Radar) Image to accurately obtain an effectively computing power. Looking for data processing and task computing as the basis for parallelization, we propose a parallel change detection method for multi-temporal SAR images based on wavelet transform, and comprehensively apply the Q-learning model of parallel computing with intelligent processing. Firstly, according to the statistical features of SAR images and the semantics of Convolutional Neural Network Pixel (CNN-Pixel) analysis for an efficient change detection method based on Q-Learning semantic analysis and wavelet transform is proposed. Secondly, on the basis of Pixel-Data is achieved the accuracy of change detection, probabilistic and statistical conjugate multi-distribution function features, which is pixels and tasks are pre-processed, pixel-space data calculation of Gaussian mixture model and conjugate gradient transform task model for parallel calculation. Thirdly, the change-sequence data with obstacles at the time series model of Q-learning in the changing target object data on which the surrounding path is an obstacle to achieving that image-pixel sequence data of conjugate gradient calculation data and threshold iteratively optimizes the feature extraction of the changing area, the multi-sample training set is obtained, and then the Bellman equation model is established to calculate the statistical features including the obstacle for SAR. Image change detection. The experiments are show that the parallel change detection method and computing power will achieve better results, and the Q-learning method of behavioral science is widely used in remote sensing applications to solve the application of local area change detection in images. Enhancement-Learning platform is adopted to find the best fitness of the threshold change computing precision data and task hybrid parallel computing model about it.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
In this paper, a dynamic three-dimensional measurement method based on convolution neural network and binocular structured light system is proposed. We propose a convolution neural network to extract the real and imaginary terms of the first-order spectrum of a single frame fringe pattern. In our learning model, the loss function is established with output consistency, phase consistency and feature consistency as the joint constraints. And the dataset is built with actual deformed patterns of different scenes and frequencies. Furthermore, a dual frequency stereo phase unwrapping algorithm based on virtual plane is designed. Combined with the network, the absolute phase can be obtained by only two fringe projections in the measurement range, enabling the dynamic three-dimensional reconstruction of discontinuous or multiple isolated objects. The experimental results show the proposed network can significantly improve the accuracy of phase retrieve by 20 times compared to Fourier Transform Profilometry and the measurement error of the measurement system proposed in this paper for calibration sphere is less than 0.04mm. Furthermore, the measurement results of the dynamic process of palm unfolding verify the feasibility and the effectiveness of the proposed method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Motion compensation (MoCo) is an important step for obtaining a high-quality image of synthetic aperture sonar (SAS). In this paper, a novel three-dimensional (3D) backprojection (BP) algorithm is proposed to solve the MoCo question for the multiple receiver SAS with the six degrees of freedom (DOF) motion error. In order to improve the MoCo capacity of the proposed 3D BP algorithm, some more accurate position data of sonar array are calculated by the method of space rectangular coordinate transformation. According to the inherent relationship between sonar array and inertial navigation system (INS), the position data of sonar array at the sampling time of INS data are obtained based on the position data and attitude data output by INS, without any approximation. On the basis, the relatively accurate position data of sonar array at the time of pulse transmission are obtained by the method of linear interpolation. Considering the movement of SAS during the period of signal propagation, the signal propagation time for each pulse and each receiver are calculated. Moreover, the position data of each receiver of SAS at the time of signal reception are obtained. Based on above derived position data, a well-focused SAS image is obtained and the six DOF motion error are compensated simultaneously by the 3D BP algorithm. The result of experiment demonstrates the validity of the proposed algorithm.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
P-band ultra-wideband synthetic aperture radar (UWB SAR) not only has the characteristics of the high-resolution imaging, but also has the well capability of the foliage penetrating, which is potential of detecting and imaging the concealed target under the vegetation. However, there are a lot of the radio, television and mobile communication signals in the P-band, which are called as the radio frequency interference (RFI) signals. These RFI signals are mixed with target echo signals, which will cause the serious interference in the P-band UWB SAR imaging. The traditional notch method is easy to implement the RFI suppression, so it has been widely used. However, the traditional notch method is to notch each pulse echo individually, which has a high computational complexity. At the same time, the RFI suppression of each pulse echo separately will always lead to a large amount of the residual interference, so the traditional notch method has the poor RFI suppression effect. Based on the traditional notch method, this paper proposes an RFI suppression method based on the two-dimensional frequency domain (2DFD) notch, which can realize one-time processing of all echo pulses so that improve the efficiency of the RFI suppression. Meanwhile, because the bandwidth of the RFI signal is much smaller than that of the SAR echo signal, converting the received SAR echo signal to the 2DFD can further concentrate the energy of the RFI signals, so it has the better RFI suppression effect. The simulation results show that the proposed RFI suppression method based on the 2DFD notch can not only improve the efficiency of the RFI suppression but also have the better effect of the RFI suppression.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Estimating lower extremity joint angle during gait is essential for biomechanical analysis and clinical purposes. Traditionally infrared light-based motion capture systems are used to get the joint angle information. However, such an approach is restricted to the lab environment, limiting the applicability of the method in daily living. Inertial Measurement Units (IMU) sensors can solve this limitation but are needed in each body segment, causing discomfort and impracticality in everyday living. As a result, it is desirable to build a system that can measure joint angles in daily living while ensuring user comfort. For this reason, this paper uses deep learning to estimate joint angle during gait using only two IMU sensors mounted on participants' shoes under four different walking conditions, i.e., treadmill, overground, stair, and slope. Specifically, we leverage Gated Recurrent Unit (GRU), 1D, and 2D convolutional layers to create sub-networks and take their average to get a final model in an end-to-end manner. Extensive evaluations are done on the proposed method, which outperforms the baseline and improves the Root Mean Square Error (RMSE) of joint angle prediction by up to 32.96%.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tobacco is a valuable plant in agricultural and commercial industry. Any disease infection to the plant may lower the harvest and interfere the operation of supply chain in the market. Image-based deep learning methods are cutting-edge technologies that can facilitate the diagnosis of diseases efficiently and effectively when large-scale dataset is available for training. However, there is not a public dataset about tobacco currently. A comprehensive dataset is appealed to take advantage of deep learning methods in tobacco cultivation urgently. In this paper, we propose to create a specific dataset for tobacco diseases, called Tobacco Plant Disease Dataset (TPDD). 2721 tobacco leaf images are taken in field. The dataset serves for two purposes: disease classification and leaf detection. For classification, we identify 12 classes and provide two types of disease annotations: 1) Whole Leaf Section; 2) Disease Fragment Section. For leaf detection, we provide two kinds of bounding box: rectangle bounding box and polygon bounding box. In addition, we conduct baseline experiments to illustrate the usefulness of TPDD: 1) using deep learning model to detect single disease and multiple diseases; 2) using YOLO-v3 and Mask-RCNN to detect leaves. We hope that the dataset could support the tobacco industry, also be a benchmark in fine-grained vision classification.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Sleep-disordered breathing (SDB), a common sleep disorder, shows symptoms of shallow breathing or paused breathing during sleep called respiratory events. SDB was conventionally diagnosed based on overnight multi-channel polysomnography (PSG) in clinical treatment. However, this process requires experienced sleep technicians to annotate and is quite labour-intensive. In this study, a novel one-dimensional signal based object detection network was proposed for automatic, high efficiency detection and classification of different kinds of respiratory events from continuous PSG signals. Our method can locate respiratory events in PSG signal data and classify them into four categories for further clinical treatment. The method was further validated on a PSG clinical dataset collected from Beijing Tongren Hospital. Precision, recall and F1-score of 84.9%, 85.1%, 85.0% were achieved for events detection with total accuracy rate reaching 74.9% in classification of detected events. The result shows that one-dimensional signal object detection is a promising method to locate the characteristic waveform and extract signal features. Such method can be applied in other signal feature detection field.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Due to the presence of tall buildings, mountains and other high occlusions in mountainous cities, this will produce fading phenomena, which will result in weak or even unrecognizable signals from the main users. To address this problem, a Related Vector Machine (RVM) based spectrum sensing method is proposed in this paper. First, the cognitive radio users (CR users) selection mechanism based on location correlation is designed, and some CR users with the best sensing performance are selected to participate in the sensing of the primary user (PU). Second, some parameters that reflect the characteristics of the PU signal are selected as the sample parameters. Finally, the signal samples received for both the presence and absence of the PU are sensed by using RVM. The experimental results show that the proposed algorithm has high classification detection performance in each low signal-to-noise ratio case, and effectively realizes the perception of the PU signal.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
With the growth of digital data created by us, a large number of deep learning models have been proposed for data mining. Representation learning offers an exciting avenue to address data mining demands by embedding data into feature space. In the healthcare field, most existing methods are proposed to mine electronic health records (EHR) data by learning medical concept representations. Despite the vigorous development of this field, we find the contextual information of medical concepts has always been overlooked, which is important to represent these concepts. Given these limitations, we design a novel medical concept representation method, which is equipped with a self-attention mechanism to learn contextual representation from EHR data and prior knowledge. Extensive experiments on medication recommendation tasks verify the designed modules are consistently beneficial to model performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Low frequency ultra-wideband bistatic synthetic aperture radar (UWB BSAR) system is able to penetrate the foliage, get the high-resolution BSAR image, and offer the increased target information. In this paper, the low frequency UWB BSAR electromagnetic scattering characteristic is analyzed. First, the target under the foliage are modeled and discussed. Moreover, the method of moment (MoM) is proposed for the electromagnetic scattering characteristic. Finally, the simulation experiment is conducted for the modeling and analyzing of the electromagnetic scattering characteristic of the targets, which verifies the correctness of the low frequency UWB BSAR electromagnetic scattering characteristic.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.