PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 1339601 (2024) https://doi.org/10.1117/12.3054465
This PDF file contains the front matter associated with SPIE Proceedings Volume [13396], including the Title Page, Copyright information, Table of Contents, and Conference Committee information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 1339602 (2024) https://doi.org/10.1117/12.3050404
Due to problems such as edge blurriness and noise interference in liver medical image segmentation, there are still some challenges for automated liver and liver tumor segmentation. To address these problems, we propose a hybrid network called U-Trans Net, which combines Convolutional Neural Networks (CNNs) and Transformer models. In the encoding stage, it utilizes parallelly CNN and Transformer branches for multi-scale feature extraction, allowing the fusion of global local and local information in a hierarchical manner. The resulting enhanced feature representation is then used for prediction through decoding. Experimental results on the LiTS-ISBI2017 dataset demonstrated that our method could improve segmentation accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 1339603 (2024) https://doi.org/10.1117/12.3050393
This paper focuses on traditional deep learning-based no-reference (or reference-based) image quality assessment (IQA) methods, enhancing them from the perspective of image feature extraction. It replaces the VGG16 network with the ResNet50 network for feature extraction and uses the Global Average Pooling (GAP) layer instead of FC512. Subsequently, it computes the weighted average of quality scores for different parts of the image to obtain the overall image quality. Specifically, the paper first preprocesses images by cropping, flipping, mirroring, tilting, and other methods to expand the image dataset and make it more reflective of real-world scenarios. Then, it utilizes the ResNet50 network for feature extraction, showing superior performance compared to the VGG network. Finally, a weighted pooling method is employed to derive the ultimate image score. On the TID2013 and CLIVE datasets, the Pearson Linear Correlation Coefficient (PLCC) values are 0.877 and 0.7095, respectively, while the Spearman Rank Order Correlation Coefficient (SROCC) values are 0.8510 and 0.6956. These values surpass those obtained using traditional algorithms like SSIM and GSMD, indicating the superior predictive performance of the new algorithm. Moreover, the proposed algorithm demonstrates advantages in speed and accuracy, meeting real-time application requirements more effectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Song Huang, Shutao Xiong, Fei Wang, Longyi Zhang, Dingding Fan
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 1339604 (2024) https://doi.org/10.1117/12.3050550
Fiber layer fracture in CFRP (Carbon Fiber Reinforced Polymer) cylinders is one of the most common damages during the usage of these cylinders. Once this damage occurs during the cylinder's load operation, it can develop and eventually lead to extremely serious consequences. Currently, there is very little research on the classification of the degree of fiber layer fracture. In this paper, we utilize acoustic emission response signals under artificially simulated scenarios of different degrees of fiber fracture, perform Wigner-Ville analysis to obtain feature images, and use an improved HRNet for damage degree recognition. The improved HRNet method introduces ASPP (Atrous Spatial Pyramid Pooling) into the classification network, which effectively resists environmental noise, and provides a stable and sensitive response to fiber fracture. In the experiments, it effectively recognizes three different degrees of fiber fracture damage.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 1339605 (2024) https://doi.org/10.1117/12.3050616
As the population ages, early diagnosis and treatment of lung diseases become increasingly important. Accurate assessment of aging-related changes in lung CT images is crucial for the prevention and treatment of related diseases. Traditional methods for lung aging assessment from CT images are time-consuming, subjective, and heavily reliant on the clinical experience of doctors. To address these issues, this paper proposes a lung aging assessment method with 3D-CA Net. The feature extraction part of the proposed network consists of four main 3D Convolutional and Composite Multidimensional Attention Modules. By introducing the Composite Multidimensional Attention Module, the advantages of spatial attention and self-attention are both utilized. Additionally, an improved E-cross-entropy loss function is employed to reduce overfitting and enhance generalization. Experimental results demonstrate that the 3D-CA Net significantly outperforms existing methods in terms of accuracy, macro-averaged precision, macro-averaged recall and macro-averaged F1 score. This work provides a comprehensive solution for lung CT image aging assessment and offers insights for future advancements in medical imaging analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 1339606 (2024) https://doi.org/10.1117/12.3050753
This study aims to explore the construction consumption determination method based on convolutional cloud neural network image recognition technology. By analyzing the application of convolutional cloud neural network in the field of image recognition and combining the actual situation and data collection at the construction site, a construction consumption determination method based on convolutional cloud neural network is proposed. By processing and analyzing the image data of the construction site, combined with the deep learning algorithm, accurate determination of the consumption of various materials and resources in the construction process is realized. The results show that the construction consumption measurement method based on convolutional cloud neural network image recognition technology has high accuracy and practicability, and can provide effective support for construction management and cost control.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 1339607 (2024) https://doi.org/10.1117/12.3050758
Taking the wall specimens of Guiyang city in hot summer and cold winter zone as the research object, the hollow imperfection of different areas and sizes are made in the surface layer of different wall specimens. Guide PS800 infrared thermal imaging instrument is used to detect the defects in the surface layer. The results show that the identification of hollow defects is independent of the ambient temperature and whether the surface layer is exposed to sunlight; the area of the defect is less than 100mm × 100mm is not easy to be identified.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 1339608 (2024) https://doi.org/10.1117/12.3050534
In recent years, the attention mechanism has played a significant role in enhancing algorithm performance in deep learning-based visual tasks. Most methods focus on developing more complex attention mechanisms to improve network performance, which inevitably increases the computational complexity of the model. To balance performance and computational complexity, this paper proposes the CAM (Compound Attention Module) attention mechanism, which delivers substantial performance improvements with only a slight increase in parameters. The CAM module operates in two dimensions: space and channel. It is a lightweight, plug-and-play module with minimal computational overhead. We validate our CAM module through extensive experiments on image classification and object detection tasks using the CIFAR-100 and VisDrone2019 datasets. Experimental results demonstrate that the model consistently improves image classification and object detection performance, outperforming similar modules.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 1339609 (2024) https://doi.org/10.1117/12.3050460
Although the existing Multi View Video Coding (MVC) achieves high coding efficiency, it also increases the risk of error propagation. This article will study the multi view and multi description video encoding technology from the perspective of improving fault tolerance, and propose a multi view and multi description video encoding structure based on pattern and prediction vector reuse. The experimental results show that the proposed multi view and multi description video encoding algorithm based on pattern and prediction vector reuse can achieve the goal of significantly reducing encoding complexity while maintaining high encoding efficiency, its performance is superior to other existing algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960A (2024) https://doi.org/10.1117/12.3050429
With increasing library visits, book misplacement has become a significant issue, affecting over 5% of books globally and burdening library staff. To address this, we developed a system using YOLOv8 with an enhanced CBAM (Convolutional Block Attention Module) for spine segmentation. The HSV (Hue, Saturation, Value) algorithm segments the book label area, and Optical Character Recognition (OCR) is used to extract call numbers. Book misplacement is detected using ASCII codes, and results are presented to library staff. Experimental results show that this approach significantly improves book spine recognition accuracy compared to traditional methods and better meets users' needs.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960B (2024) https://doi.org/10.1117/12.3050445
Liquid crystal display (LCD) screens are widely used in various types of smart meters, with ultrasonic water meters being one of their applications. The display on LCD screens is composed of various digits and icons, which are LCD pattern elements that carry significant attribute information. The current industrial inspection of LCD pattern elements often relies on manual meter reading methods, which has high labor cost and low accuracy. Therefore, In this study, the ConvNeXt model based on deep learning is used to achieve automatic detection of ultrasonic water meter LCD pattern elements. Initially, image processing techniques are utilized to segment the images, thereby extracting a dataset of independent LCD pattern elements; this dataset is input into the ConvNeXt model for training. Subsequently, a user interface is designed to visualize the detection results. The study presents an automatic detection system capable of identifying 18 types of LCD pattern elements with an accuracy rate of up to 99.94%. The ConvNeXt model demonstrates exceptional recognition precision in the automatic detection of ultrasonic water meter LCD pattern elements, indicating its significant practical value and application potential.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960C (2024) https://doi.org/10.1117/12.3050539
The importance of popularity in assessing the value of visual content is particularly notable in contemporary social media contexts. The demand for optimizing image set compression based on projected popularity has surged. However, the relationship between intrinsic visual content and its influence on popularity is not yet fully understood. In our study, a new approach is introduced called the Intrinsic Image Popularity Assessment (IIPA) scheme, which leverages deep neural networks (DNNs) to predict popularity. The IIPA model is trained using a learning-to-rank approach. Due to the lack of a pre-existing IIPA dataset, a probabilistic method is developed to automatically generate millions of image pairs with discernible popularity differences, creating the first large-scale IIPA database. Comprehensive testing demonstrates that our proposed IIPA model surpasses current methods and achieves human-level performance.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960D (2024) https://doi.org/10.1117/12.3050430
In connected multi-vehicle system, collaborative perception and fusion technologies serve to compensate for the limitations inherent in single vehicle sensors. It can effectively combine the filtering and tracking results from sensors by fusing. However, existing multi-vehicle fusion methods do not adequately address the challenges due to the inherent inaccuracies in single vehicle data fusion and the differences in perception data among vehicles in real-world multi-vehicle scenarios. To address these challenges, we propose a fusion framework for real multiple connected vehicle scenarios. First, perceived target data are associated by the Unscented Kalman Filter (UKF) and trajectories are generated by the Hungarian algorithm. Second, generated trajectories are deduplicated and fused using methods based on Hausdorff distance with average distance complementation. Experiments conducted on real-world scenarios show a notable enhancement in fused target position accuracy and motion state accuracy. This improvement can significantly contribute to bolstering the overall driving experience through enhanced driver assistance, human-machine interaction, and other end-of-vehicle applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Xiaolin Wang, Zhenkun Zhang, Chunxia Jiang, Haolin Li, Zejun Jiang, Jiayu Zou, Ming Liu, Zhen Liu, Chang Liu
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960E (2024) https://doi.org/10.1117/12.3050470
The vision-based industrial chip positioning method is very important for the application of surface mount technology. Because of the particularity of BGA package, its localization algorithm is suitable for image processing. In this paper, a complete set of subpixel positioning algorithm flow of BGA chip is proposed and a new smoothing method is introduced. When obtaining the subpixel coordinates, we use 8 directional templates to find the interpolation direction and derive the Lagrange interpolation formula to calculate the subpixel position, and then use the least square method to calculate the solder pad position and the chip rotation angle. Through experiments, the accuracy of our algorithm is better than that of traditional pixel-level detection methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Yingping Sun, Dong Wang, Dengtian Bai, Yan Liu, Rongzhen Miao
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960F (2024) https://doi.org/10.1117/12.3050648
In light of the escalating threat posed by lasers, there is a growing imperative to investigate high-performance laser detection systems. When juxtaposed with non-imaging alternatives, imaging laser detection systems offer the distinct advantage of superior angular measurement accuracy. However, they concurrently impose heightened demands on system efficacy. This study undertakes theoretical computations pertaining to the pivotal technical parameters and signal-to-noise ratio of imaging laser detection systems. It furnishes the methodologies and outcomes of these calculations, subsequently subjecting them to empirical validation on the ground. Through experimentation, the validity of the established theoretical computation approach was affirmed, highlighting its utility in informing the optical system design and detector selection processes of laser detection systems. This integrative approach bears significant implications for advancing the overall performance inquiry of such devices.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960G (2024) https://doi.org/10.1117/12.3050603
With the rapid development of artificial intelligence and big data technologies, the healthcare industry is undergoing a digital transformation. This research aims to apply artificial intelligence technology to improve the efficiency of healthcare delivery and patient experience. Malignant tumor seriously endangers people's life and health, so early screening, diagnosis and treatment are very important. The diagnosis of cancer depends on pathological diagnosis. Tissue biopsy and imaging observation of tumor cells are important means of clinical diagnosis. In recent years, great progress has been made in machine learning methods applied to histopathological image analysis, especially deep learning-based methods to improve diagnostic efficiency and accuracy. The image segmentation methods adopted in this study include traditional methods and deep learning methods, using models such as UNet and Transformer. By obtaining data set from Kaggle, image preprocessing, data enhancement and other operations are carried out, and TransUNet and UNet models are used to segment and make statistics on the data set. The results show that TransUNet can detect more cell numbers in most cases, while UNet may be more accurate in estimating cell size and coverage. The UNET-based segmentation technique can also achieve good results on small data sets, and can be used as a basic step for further analysis of cancer cells.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960H (2024) https://doi.org/10.1117/12.3050750
In intelligent transportation systems, pedestrian intentions at intersections must be classified to enhance pedestrian safety and optimize traffic flow. Accurately predicting pedestrian actions can significantly reduce traffic accidents and improve overall traffic management efficiency. An improved YOLOv5 model is proposed to efficiently and accurately identify pedestrian intentions. The model combines the Multi-scale Diluted Attention (MSDA) mechanism, which extends attention to different scales to capture the subtle movements of pedestrians, and the GhostNet lightweight module, which reduces the number of computational parameters and makes the model suitable for real-time applications. In addition, a new loss function, ACFloss, which combines adaptive weight-focusing loss based on contextual relationships and spatio-temporal consistency loss, is designed to understand the behavioral intent of actions better. Experimental results show that the improved YOLOv5 model achieves a classification accuracy of 93.7%, 4.5% higher than the benchmark, and the inference speed is improved by 35%. This improvement provides a reliable solution for intelligent transportation systems in complex intersection scenarios.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Intelligent Target Recognition and Detection Technology
Qifan Guo, Lizhe Liu, Xiaobo Guo, Kai Li, Xiaoyu Dong, Ning Pan
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960I (2024) https://doi.org/10.1117/12.3051197
Feature Pyramid Network (FPN) is an enhancement method for CNN to express image feature. The traditional feature pyramid model cannot fully transfer shallow detail information to deep semantic features, which leads to insufficient feature fusion and has a certain impact on the learning effect of visual tasks. Regarding the above issues, this paper proposes a multi-scale balanced aggregation network model (MBA-Net). On the basis of the FPN backbone network, MBA-Net fully integrates the features of each level, promoting the full utilization of the original image information. In addition, we further enhance the feature expression ability by utilizing attention mechanism, which strengthens the expression ability of effective features by reducing the information redundancy of feature maps at different scales. Afterwards, we conduct experiments on the PASCAL VOC2012 and MS COCO2014 datasets and verifies the effectiveness of MBA-Net for feature fusion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960J (2024) https://doi.org/10.1117/12.3050541
In the field of smart agriculture, the rapid and accurate detection of grape leaf diseases is crucial, especially for early-stage small lesions. To enhance the efficiency of detecting grape leaf diseases in resource-limited environments, this study presents an optimized lightweight model based on the YOLOv8n framework, named YOLO-BMCA. By integrating the upgraded ADown module and incorporating CGBlock enhancements, the model not only improves the efficiency of capturing both local and global information but also significantly reduces the consumption of computational resources. The adoption of BiFPN strengthens the multi-level feature fusion, while the MLCA mechanism focuses on enhancing the capture of key features, thereby increasing accuracy. In target detection of grape leaf diseases, YOLO-BMCA achieved an accuracy rate of up to 94.7%, with improvements of 2%, 1.1%, and 3.3% in the recall rate, mAP50, and mAP50-90 respectively, compared to the original algorithm. Meanwhile, the model's parameter volume and size were also significantly reduced by 58.6% and 55%, respectively, demonstrating the strong applicability of the optimized model in resource-constrained environments.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Haitao Liu, Lifen Wang, Zeng Gao, Xiuqian Li, Yanxia Yang, Jun Li
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960K (2024) https://doi.org/10.1117/12.3050441
Target detection technology plays a crucial role in the aerospace field, and recognizing satellites with few samples is of great significance for ensuring national space security. However, the limited computing resources of satellites restrict the application of intelligent algorithms. To overcome this issue, this paper proposes a lightweight improvement of the YOLOv5s detection algorithm with the introduction of an attention mechanism. First, the lightweight module GhostNet is used to modify the backbone network, serving as the primary feature extraction network of the model. The Coordinate Attention EMA module is introduced in the neck network to enhance the feature representation capability for detecting target satellites, thereby improving detection accuracy in complex backgrounds. Simulations based on actual sampled data show that the algorithm's parameter count is reduced by 27.6%, achieving a mAP@0.5:0.95% of 91.21% and a detection rate of 400 FPS. This effectively enhances the model's multi-scale feature representation and fusion capabilities, improving target detection accuracy.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960L (2024) https://doi.org/10.1117/12.3050745
The previous early prediction methods for frost damage and crack growth of expressway pavement have poor prediction results because they only denoise the collected pavement images. Therefore, an early prediction method for frost damage and crack growth of expressway pavement based on infrared thermal imaging technology was designed. The image acquisition device is used to collect the highway pavement image, and gray conversion and noise reduction processing are carried out on the collected pavement image to improve the image quality. Under the effect of infrared thermal imaging technology, the length and width characteristics of pavement frost damage and crack growth are extracted, and the early prediction of pavement frost damage and crack growth is realized by predicting and screening the growth path. Through the above design, the design of early prediction method for frost damage and crack growth of expressway pavement is completed. In the experimental test, compared with the previous early prediction methods for frost damage and crack growth of expressway pavement, the prediction error rate of the designed early prediction method for frost damage and crack growth of expressway pavement based on infrared thermal imaging technology is 0.17%, and the prediction effect is better.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960M (2024) https://doi.org/10.1117/12.3050535
This paper proposes a deep learning-based food image recognition system to enhance checkout efficiency in university cafeterias. Addressing the bottlenecks of traditional manual checkout methods, a food image classification model has been designed and implemented, combining Convolutional Neural Networks (CNN) and Fully Connected Networks (FCN). The system extracts image features through convolutional layers, reduces feature map dimensions using pooling layers, and makes classification decisions through fully connected layers. Additionally, data augmentation techniques were utilized to expand the training data, and optimization methods such as regularization and Dropout were introduced to improve the model's generalization ability. The main structure of the paper includes sections on system architecture design, data preparation and processing, experimental results, and analysis. The system architecture design section details the construction methods of CNN and FCN and their application in this system. The data preparation and processing section describes the process of obtaining raw data, data augmentation, and preprocessing. The experimental results and analysis section demonstrates the impact of different parameter settings on model performance and proves the effectiveness of the proposed method through experimental data. The system is characterized by its efficient food image classification ability and good generalization performance. Extensive experiments on the dataset revealed that increasing the sample size and setting hyperparameters appropriately can significantly improve the model's accuracy and stability. With a sample size of 10,000, the model achieved an average accuracy of 86.88%, and the low standard deviation indicates stable performance in practical applications.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960N (2024) https://doi.org/10.1117/12.3050685
To upgrade the exactness of the 3D point cloud target location calculation, this investigate presents an progressed PointPillars strategy leveraging spatial consideration and channel components. At first, we coordinated a refined point cloud include representation into the column include arrange, upgrading the coding of highlights and making strides the uniqueness of each point's representation. Secondly, we join a spatial consideration component into the pseudo-image handling. This method recalibrates the highlight weights of spatially encoded focuses, subsequently boosting the algorithm's capability to extricate basic highlights and upgrading location performance. Experimental approval on the broadly utilized KITTI dataset illustrates critical enhancements in discovery exactness and soundness compared to the initial calculation.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960O (2024) https://doi.org/10.1117/12.3050417
To address the issues of existing algorithms related to inaccurate estimation of rectangular boxes and tracking box drift caused by fast-moving objects, leading to the loss of tracking targets, we propose a UAV object tracking algorithm based on the α-EIOU-NET. Initially, we introduce the EIOU-NET, which employs EIOU as a metric for state estimation to optimize parameters. EIOU-NET constraints focus on minimizing deviations in center position and the width and height of predicted rectangular boxes by the tracker. Subsequently, we introduce α to dynamically adjust the weights of loss values and gradients to expedite model convergence, termed α-EIOU. This adaptation significantly enhances both convergence speed and positioning accuracy. Finally, we utilize a spatial-channel attention mechanism and a lightweight feature extraction backbone network, ResNet34, to optimize features, ensuring a balance between algorithm speed and accuracy. Validation experiments were conducted using the UAV123 and Visdrone2019 UAV datasets, along with the GOT10k dataset. Experimental results demonstrate the effectiveness and feasibility of the algorithm, showcasing strong performance in diverse tracking environments, achieving a running speed of 30 fps.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960P (2024) https://doi.org/10.1117/12.3050446
The working environment of wind turbine gearboxes is complex and variable, with strong noise, which makes traditional fault diagnosis methods inadequate for accurate fault identification. To address this issue, this paper proposes a fault diagnosis method based on Wavelet Packet Denoising combined with CNN-Swin Transformer-LSTM. Firstly, the original signal is decomposed, denoised, and reconstructed using wavelet packets to highlight the effective periodic impact components within the signal, and the reconstructed signal is converted into two-dimensional wavelet timefrequency images. Then, Convolutional Neural Networks (CNN) are used to extract basic feature information from the images. The feature maps are then input into a Swin Transformer model to automatically extract multi-scale feature information based on the self-attention mechanism. Following this, Long Short-Term Memory (LSTM) networks are employed to capture temporal features of the data. Additionally, the Convolutional Block Attention Module (CBAM) is introduced to enhance feature representation capability. Finally, the method classifies different fault types. Experimental verification shows that the proposed method achieves an accuracy of 99.62% and 99.46% on two working condition datasets, respectively. Under conditions of strong noise and variable working conditions, the fault diagnosis accuracy reaches 92.24% and 96.16%. The experimental results demonstrate that this model possesses strong feature learning capabilities, robust anti-interference ability, and good generalization performance. Compared to other existing diagnosis techniques, it exhibits superior diagnostic performance and reliability.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960Q (2024) https://doi.org/10.1117/12.3050892
In a video multi-object tracking scene, the problems of target occlusion, track crossing still exist. Limited by the quantity of the detector, the traditional association algorithms will lead to missing and false tracking and increase the number of target ID switch and trajectory fragmentation. Aiming at these problems, this paper proposes an optimized algorithm based on the framework of Trajectory Poisson Multi-Bernoulli Mixture Filter, combines the joint detection and embedding algorithm to output both targets’ detections and features, designs new association and update algorithm to improve the trajectory maintenance. The result shows that the proposed algorithm can effectively decrease the number of ID switch and maintain the trajectories when tracking with short-time occlusions and crossing. It is tested on MOT datasets and gets a positive effect compared with other algorithms.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Zhe Geng, Bilendo Delvis Da Vida, Chongqi Xu, Shiyu Zhang, Xiang Yu, Daiyin Zhu
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960R (2024) https://doi.org/10.1117/12.3050540
Unlike computer vision datasets, aerial images often come with prior knowledge on the distance between the object of interest and the sensors, which makes it possible to recover the sizes of the targets before making any predictions. On the other hand, these images usually consist of large amount of both out-of-library (OOL) targets of high values and clutters, which makes it very challenging to strike a balance between recall rate and precision. To effectively combine the complementary information contained in the aerial images collected by the synthetic aperture radar (SAR) systems and infrared (IR) sensors mounted on drones, a novel two-stage cross-modality vehicle detection and classification strategy is proposed. In Stage 1, all the potential targets are sorted out to form a large candidate pool by jointly considering the target features exhibited in both the SAR and the IR images so that targets with low radar cross section (RCS) are preserved. In Stage 2, multichannel convolution neural network (CNN) models are used to identify the in-library targets based on the multiview target chips while rejecting all the others as either OOL objects of interest or clutters. Experiment results show that the proposed two-stage multiview cross-modality network model architecture offers superior target detection and classification performance than the single-stage deep neural networks relying solely on the IR or SAR data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960S (2024) https://doi.org/10.1117/12.3050403
This paper discusses the research of dangerous driving behavior recognition method based on convolutional neural network (CNN). With the increase of car ownership, dangerous driving behavior has become an important hidden danger of road traffic safety. In order to improve road safety, this paper summarizes the research background, significance and specific methods of using CNN technology to identify dangerous driving behavior. In the research process, firstly, the driving behavior data was preprocessed to enhance the generalization ability of the model. In the model training stage, annotated data was used to optimize the model parameters. Then, the recognition process of driving behavior is analyzed, and a lightweight image classification model is analyzed and constructed. At the same time, three typical behavior recognition models are quantitatively compared to analyze their advantages and application scenarios, which provides a powerful reference for accurately identifying users' dangerous driving behaviors and provides a strong support for establishing a safe driving early warning model.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960T (2024) https://doi.org/10.1117/12.3050406
Aiming at the problem of difficulty in extracting key features caused by complex background information and many types of scenes and targets in the stamen image, the Multi-Head Self-Attention (MHSA) module is added to the YOLOv8 backbone network, which improves the ability of the backbone network to extract key features. To address the challenges of significant target scale variations and the presence of small targets in stamen images, we introduce a specialized small target detection layer. This enhancement allows the model to focus more on detecting small targets, thereby improving overall detection performance. By strengthening the YOLOv8 model's capabilities in small target detection and enabling efficient fusion of multi-scale and multi-layer features extracted by the backbone network, the detection performance is significantly enhanced. We conducted ablation experiments, and the experiment results show that the proposed algorithm improves the recognition precision and recall indexes by 2.7% and 2.9% respectively. Additionally, the mAP50 value reaches 99.3%, meeting the requirements for real-time accurate positioning of stamens.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960U (2024) https://doi.org/10.1117/12.3050443
In order to maintain a good living and learning environment and students' health in campus, it is particularly important to detect students' smoking behavior in campus. With the continuous development of artificial intelligence, object detection algorithms based on computer vision are becoming increasingly perfect. However, due to the small size of the cigarette target, existing object detection algorithms have serious missed detection and false detection phenomena, and are difficult to use in a real campus environment. Based on the YOLOv10 object detection algorithm, this paper proposes a multi-level end-to-end detection model M-YOLOv10. Aiming at the limited feature extraction capabilities of the original YOLOv10, a self-attention mechanism is introduced to improve the model's global feature extraction capabilities. At the same time, the feature fusion module is improved to achieve multi-level fusion and improve the model's small object detection capabilities. Experimental results show that compared with the model before improvement, the M-YOLOv10 algorithm improves mAP@0.5 by 17.6%, and the model detection speed remains basically unchanged.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960V (2024) https://doi.org/10.1117/12.3050436
Aiming at the problem that the surface of pile fabrics is soft and delicate, which is difficult to detect defects, this paper proposes a defect detection model for pile fabrics based on the improved EfficientNet. A deep neural network leveraging EfficientNet as its core feature extraction network is devised, incorporating the enhanced I-SPP spatial pyramid pooling module to enhance adaptability and accommodate input data across various scales. The SE module and Swish activation function in the MBConv module are replaced with CA attention module and Mish activation function to improve the model robustness and detection efficiency. Experiments show that the method proposed in this article improves the EfficientNet model achieves a mean average accuracy of 88.5% on the test set. Compared with other neural network architectures, this method improves the accuracy and efficiency of detection, which can meet the needs of related production enterprises.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960W (2024) https://doi.org/10.1117/12.3050392
The design of this paper employs the use of a scientific software, namely VOFA+, for the purpose of graphical debugging of embedded hardware, with the objective of producing the upper unit of the intelligent trolley. The upper unit establishes a Bluetooth connection to the intelligent trolley and is capable of modifying the control parameters of the intelligent trolley operation in real time. Additionally, it is able to plot the waveforms of the operation data in real time on the designated window. The upper unit, as designed in this paper, is capable of adjusting the PID parameters of the trolley in real time, viewing the waveform changes of the running data of the trolley directly in the drawing window, and determining the reasonability of the modified parameters. Additionally, the intelligent vehicle is equipped with a six-axis sensor MPU6050, which is reconfigured based on the obtained IMU data to track the cart's movement position and display its trajectory in real time to the upper unit. The data obtained from both the simulation and the actual test have been statistically analyzed, and it has been demonstrated that all the functions of the design have been achieved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960X (2024) https://doi.org/10.1117/12.3050397
As one of the most important natural resources, forest plays an irreplaceable role in human development and survival. Most of the traditional tree detection methods are artificial field surveys. This paper proposes to use a UAV equipped with LIDAR to fly on site to collect tree point cloud information, and then use deep learning method to extract forest tree feature information and segment trees by PointNet network, so as to achieve tree detection effect.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960Y (2024) https://doi.org/10.1117/12.3050413
Traditional object detection algorithm YOLOv8 faces challenges when robotic pollination needs to identify crops, such as misdetection and omission due to the small size of the crop, dense planting, vegetation cover and background confusion. To address these issues, an improved algorithm based on YOLOv8 is proposed to enhance detection performance. First, the small object detection layer is added to enhance the learning ability of feature information of small objects. Second, BiFPN is used to replace FPN and PAN for multi-scale feature fusion. Finally, two existing difficult sample mining methods, Loss Rank Mining (LRM) and focal loss (FL), are introduced to adjust the optimized loss function and achieve difficult sample mining to improve the detection of small crops. Comparative and ablation tests are carried out using tomato flower image datasets derived from a simulated autonomous pollination robot sensing system in order to verify the efficacy of the revised method. The results demonstrate that the improved YOLOv8 algorithm enhances the average accuracy by 9.787% and achieves more accurate detection of small crop.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 133960Z (2024) https://doi.org/10.1117/12.3050477
Ranch management gradually tends to be intelligent with the development of the times, and in the process of realizing intelligence, livestock tracking technology appears to be crucial. In order to realize the fast tracking of sheep, a sheep tracking method based on the improved DeepSORT algorithm is designed. Firstly, the process of the tracking algorithm is improved by adding one round of IOU matching and lowering the threshold of the added IOU matching; when performing the trajectory matching, the spatial and positional similarities and correlations are taken into account at the same time, which helps to improve the accuracy and robustness of the target matching; and the state vectors of the Kalman filtering are improved to improve the robustness of the target tracking. Experiments are conducted on the dataset, and the tracking accuracy of the improved DeepSORT algorithm is 94.6%, which is 5.3%, 7.55%, 4.35%, and 2.2% higher than the tracking accuracies of ByteTrack, JDE, SORT, and DeepSORT algorithms, respectively, and it has a very good generalization ability. The algorithm can track sheep accurately and quickly, which provides new technical support for animal welfare.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Tenghui Wang, Xiaofeng Zhang, Yan Ma, Yapeng Wang, Haijun Xie, Mingchao Zhu, Binghua Su, Dong Yao
Proceedings Volume Third International Conference on Image Processing, Object Detection, and Tracking (IPODT 2024), 1339610 (2024) https://doi.org/10.1117/12.3050416
At present, YOLO-based of algorithms have been widely used in urban planning, traffic monitoring, ecological protection, military security and other fields, and their applicable scenarios are expanding. In view of the problems of high misdetection rate, high omission rate and insufficient accuracy in the image target detection task, this topic is dedicated to study the improved target detection algorithm based on YOLO V7. In order to optimize the time cost and computing resource consumption, the Anchor-free based design is introduced, and through optimizing the design of decoupling head, the independent processing of classification and regression tasks is realized to improve the efficiency of feature extraction. Based on this method, the CBATM attention mechanism is used to better capture the intercorrelations between features and improve the representational ability of the model. In the loss function section, this paper adopts the SimOTA method in YOLOX to realize the dynamic number allocation of positive samples, which greatly reduces the training time. Improved YOLO V7 target detection algorithm in the PASCAL VOC challenge public dataset VOC2007 data set, the results show that the improved YOLO V7 target detection algorithm than the original YOLO V7, has higher detection accuracy and efficient performance, the average detection accuracy (mAP) increased by 2.27%, compared with other classic target detection algorithm, the improved YOLO V7 performance is more accurate and efficient.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.