Synthetic aperture radar (SAR) is an essential tool for ocean surveillance. As the main participants on the ocean, ships are the most important targets for ocean monitoring. So it is of great importance to develop ship detection algorithms for SAR sea images. The algorithms based on Convolutional Neural Network perform far better than the traditional methods based on manual features on ship detection task due to the powerful feature representation abilities. The algorithms based on Convolutional Neural Network can be divided into one-stage algorithms, and two-stage algorithms. Two-stage algorithms have high accuracy, but are relatively time-consuming. One-stage algorithms have high inference speed, but compared with two-stage algorithms, they have lower accuracy. So in this article, we proposed an modified one-stage detection algorithm to improve the accuracy of ship detection in a condition that the modified algorithm meet the real-time requirement. First, the small model of one-stage algorithm YOLOV5 is chosen as the base network to get the high inference speed. Then, to improve the accuracy of ship detection with a little increase in inference time and the model parameters, we integrate the one–layer super-resolution architecture with the simplist structure into the YOLOV5 network. Finally, we conducted the comparative experiments on our Dataset to verify the performance of modified YOLO V5. The experimental results show that the modified method has obtained an Average Precision (AP) improvement than the original YOLO V5 for detecting ships in SAR images with a little increase in inference time and the model parameters.
Synthetic aperture radar (SAR) has a special ability to work in any type of inclement weather, and is a very suitable tool for Ocean surveillance. Scene classification is an essential pre-task of other computer vision tasks for ocean monitoring. It is of great importance to develop scene classification technology of SAR sea images. Due to the excellent feature representation abilities of neural networks, the deep learning-based methods are far superior to the traditional methods based on manual features in scene classification task performance. Many lightweight classification networks have been proposed to improve the inference speed of the networks. But in comparison with ordinary CNNs, the lightweight networks have slightly lower accuracy for scene classification tasks. So in this article, we proposed an improved lightweight Convolutional Neural Network for scene classification of SAR sea images. First, in order to meet the real-time performance, we choose MobileNetv1 as the original classification network in this paper. Then, to compensate for the lack of accuracy, we use 1D asymmetric convolution kernels to strengthen each layer of the depthwise convolutions in the network. Finally, after training time, we merge the linear calculations of each layer of the network to convert it into the original structure. The experimental results show that the modified model has obtained an accuracy improvement than the original one on the scene classification of sea SAR images without extra computation.
After posterior lumbar surgeries (PLS), the change of the cross-sectional area(CSA) and fatty infiltration (FI) of paraspinal muscle can deeply affect the muscle activity pattern and spinal stability. The objective of this work is to perform automated paraspinal muscle (multifidus and erector spine) segmentation in magnetic resonance imaging (MRI) image. However, no work has achieved the semantic segmentation of multifidus (MF) and erector spinae (ES) due to three unusual challenges: (1) the distribution of paraspinal muscle overlaps with the distribution of other anatomical structures; (2) the fascia between MF and ES is unclear; (3) the intra- and inter-patient shape is variable. In this paper, we proposed a generative adversarial network called LPM-GAN which contains a generator and a discriminator to resolve above challenges. The generator solves the high variability and variety of paraspinal muscle through extracting high-level semantics of images and preserving the paraspinal muscle anatomy. And then, the discriminator is trained to optimize the predicted mask to make it closer to ground truth. Finally, we obtain the CSA and FI of paraspinal muscle by utilizing Otsu. Extensive experiments on MRIs of 69 patients have demonstrated that LPM-GAN achieves high Recall of 0.931 and 0.904, and Dice coefficient of 0.920 and 0.903, which reveals the method is effective.
Infrared image ship detection has important applications in military and civil affairs. Because infrared images are not easy to acquire in large quantities, deep neural networks cannot directly use infrared images for training; if the pre-trained model of visible light images is directly used for detection, the phenomenon of missed detection will be caused due to different imaging conditions. In response to this problem, this paper proposes a detection method that combines a deep convolutional neural network and salient region. Firstly, we proposed a method extracting salient region based on anchor and saliency map, then multiple new images are formed by salient regions, and the newly formed images and the original image are input to the deep convolutional neural network for parallel processing, and finally the results of the detection are integrated to produce the final detection results by the non-maximum suppression (NMS) method. The comparison results show that the method proposed in this paper can effectively reduce the rate of missed detection and thus improve the accuracy of detection.
Most visual trackers focus on short-term tracking. The target is always in the camera field of view or slight occlusion (OCC). Compared with short-term tracking, long-term tracking is a more challenging task. It requires the ability to capture the target in long-term sequences and undergo frequent disappearances and reappearances of target. Therefore, long-term tracking is much closer to a realistic tracking system. However, few long-term tracking algorithms have been developed and few promising performances have been shown until now. We focus on a long-term visual tracking framework based on parts correlation filters (CFs). Our long-term tracking framework is composed of a part-based short-term tracker and a re-detection module. First, multiple CFs have been applied to locate the target collaboratively and address the partial OCC issue. Second, our method updates the part adaptively based on its motion similarity and reliability score to retain its robustness. Third, a switching strategy has been designed to dynamically activate the re-detection module and interact the search mode between local and global search. In addition, our re-detector is trained by sampling positive and negative samples around the reliable tracking target to adapt to the appearance changes. To evaluate the candidates from the re-detection module, verification has been carried out, which could ensure the precision of recovery. Numerous experimental results demonstrate that our proposed tracking method performs favorably against state-of-the-art methods in terms of accuracy and robustness.
It is well known that achieving a robust visual tracking task is quite difficult, since it is easily interfered by scale variation, illumination variation, background clutter, occlusion and so on. Nevertheless, the performance of spatio-temporal context algorithm is remarkable, because the spatial context information of target is effectively employed in this algorithm. However, the capabilities of discriminate target and adjust to scale variation need to promote in complex scene. Furthermore, due to lack of an appropriate target model update strategy, its tracking capability also deteriorates. In the interest of tackling these problems, a multi-scale spatio-temporal context visual tracking algorithm based on target model adaptive update is proposed. Firstly, the histogram of oriented gradient features are adopted to describe the target and its surrounding regions to improve its discriminate ability. Secondly, a multi-scale estimation method is applied to predict the target scale variation. Then, the peak and the average peak to correlation energy of confidence map response are combined to evaluate the visual tracking status. When the status is stable, the current target is expressed in a low rank form and a CUR filter is learned. On the contrary, the CUR filter will be triggered to recapture the target. Finally, the experimental results demonstrate that the robustness of this algorithm is promoted obviously, and its overall performance is better than comparison algorithms.
Compared with short-term tracking, long-term tracking is a more challenging task. It need to have the ability to capture the target in long-term sequences, and undergo the frequent disappearance and re-appearance of target. Therefore, long-term tracking is much closer to realistic tracking system. But few long-term tracking algorithms have been done and few promising performance have been shown. In this paper, we focus on long-term visual tracking framework based on parts with multiple correlation filters. First of all, multiple correlation filters have been applied to locate the target collaboratively and address the partial occlusion issue in a local search region. Based on the confidence score between the consecutive frames, our tracker determines whether the current tracking result is reliable or not. In addition, an online SVM detector is trained by sampling positive and negative samples around the reliable tracking target. The local-to-global search region strategy is adopted to adapt the short-term tracking and long-term tracking. When heavy occlusion or out-of-view causes the tracking failure, the re-detection module will be activated. Extensive experimental results on tracking datasets show that our proposed tracking method performs favorably against state-of-the-art methods in terms of accuracy, and robustness.
Visual tracking plays a significant role in computer vision. Although numerous tracking algorithms have shown promising results, target tracking remains a challenging task due to appearance changes caused by deformation, scale variation, and partial occlusion. Part-based methods have great potential in addressing the deformation and partial occlusion issues. Owing to the addition of multiple part trackers, most of these part-based trackers cannot run in real time. Correlation filters have been used in target tracking owing to their high efficiency. However, the correlation filter-based trackers face great problems dealing with occlusion, deformation, and scale variation. To better address the above-mentioned issues, we present a scale adaptive part-based tracking method using multiple correlation filters. Our proposed method utilizes the scale-adaptive tracker for both root and parts. The target location is determined by the responses of root tracker and part trackers collaboratively. To estimate the target scale more precisely, the root scale and each part scale are predicted with the sequential Monte Carlo framework. An adaptive weight joint confidence map is acquired by assigning proper weights to independent confidence maps. Experimental results on the publicly available OTB100 dataset demonstrate that our approach outperforms other state-of-the-art trackers.
Target tracking is one of the most topic-active research and also the most important part in the field of computer vision. The typical deformable model target tracking algorithm decomposes each target into multi-sub-blocks, and computes the similarity of both the local areas of each target and the spatial location among each sub-block. However, these algorithms define the area and the number of sub-blocks manually. In the practical application, the tracking system can provide the interaction to select the tracking target real-timely. But it’s difficult to provide the interaction to select the sub-blocks. It means the selection of sub-blocks manually has limitation in the practical application. Aimed at the problems mentioned, this paper presents a method for automatic sub-blocks segmentation. The proposed method integrates the local contrast and the richness of texture details to get a measure function of sub-blocks. Saliency detection based on visual attention model was used to extract salient local contrast. The edge direction dispersion has been used to describe the richness of texture details. Then, the discrimination of each pixel in the target will be computed by the mentioned methods above. Finally, sub-blocks with high discrimination will be chosen for tracking. Experimental results show that the method proposed can achieve more tracking precision compared with the current deformable target tracking algorithm which selected the sub-blocks manually
Correlation filter, previously used in object detection and recognition assignment within single image, has become a popular approach to visual tracking due to its high efficiency and robustness. Many trackers based on the correlation filter, including Minimum Output Sum of Squared Error (MOSSE), Circulant Structure tracker with Kernels (CSK) and Kernel Correlation Filter (KCF), they simply estimate the translation of a target and provide no insight into the scale variation of a target. But in visual tracking, scale variation is one of the most common challenges and it influences the visual tracking performance in stability and accuracy. Thus, it is necessary to handle the scale variation. In this paper, we present an accurate scale estimation solution with two steps based on the KCF framework in order to tackle the changing of target scale. Meanwhile, besides the original pixel grayscale feature, we integrate the powerful features Histogram of Gradient (HoG) and Color Names (CN) together to further boost the overall visual tracking performance. Finally, the experimental results demonstrate that the proposed method outperforms other state-of-the-art trackers.
Recently we have been concerned with locating and tracking targets in aerial videos. Targets in aerial videos usually have weak boundaries due to moving cameras. For the purpose of target detecting, detecting the contour of the target is needed and can help with improving the accuracy of target tracking. Edge detection has assisted in obtaining some advances in this effort. However, noisy images and weak boundary limit the performance of existing contour detecting algorithms. After analyzing the structures and edge maps of a Holistically-nested Edge Detection network, we utilize the highest level side-output and improve the architecture of HED; firstly we cut and resized our images into 400*320 pixels. Secondly, we detected edges using our improved HED network. Finally, the contour of an object is found based on edge detecting in the previous stage. We have significantly decreased time spent by reducing 5 side output layers to only 1 and replacing the fusion layer with a refinement and image processing module which also helps with the result. The experimental results show that our algorithm outperforms the state-of-the-art regarding images with noise and weak boundary.
It is challenging to capture a high-dynamic range (HDR) scene using a low-dynamic range camera. A weighted sum-based image fusion (IF) algorithm is proposed so as to express an HDR scene with a high-quality image. This method mainly includes three parts. First, two image features, i.e., gradients and well-exposedness are measured to estimate the initial weight maps. Second, the initial weight maps are refined by a guided filter, in which the source image is considered as the guidance image. This process could reduce the noise in initial weight maps and preserve more texture consistent with the original images. Finally, the fused image is constructed by a weighted sum of source images in the spatial domain. The main contributions of this method are the estimation of the initial weight maps and the appropriate use of the guided filter-based weight maps refinement. It provides accurate weight maps for IF. Compared to traditional IF methods, this algorithm avoids image segmentation, combination, and the camera response curve calibration. Furthermore, experimental results demonstrate the superiority of the proposed method in both subjective and objective evaluations.
For the optical guidance system flying at low altitude and high speed, the calculation of turbulent convection heat transfer over its dome is the key to designing this kind of aircraft. RANS equations-based turbulence models are of high computation efficiency and their calculation accuracy can satisfy the engineering requirement. But for the calculation of the flow in the shock layer of strong entropy and pressure disturbances existence, especially of aerodynamic heat, some parameters in the RANS energy equation are necessary to be modified. In this paper, we applied turbulence models on the calculation of the heat flux over the dome of sphere-cone body at zero attack. Based on Billig’s results, the shape and position of detached shock were extracted in flow field using multi-block structured grid. The thermal conductivity of the inflow was set to kinetic theory model with respect to temperature. When compared with Klein’s engineering formula at the stagnation point, we found that the results of turbulent models were larger. By analysis, we found that the main reason of larger values was the interference from entropy layer to boundary layer. Then thermal conductivity of inflow was assigned a fixed value as equivalent thermal conductivity in order to compensate the overestimate of the turbulent kinetic energy. Based on the SST model, numerical experiments showed that the value of equivalent thermal conductivity was only related with the Mach number. The proposed modification approach of equivalent thermal conductivity for inflow in this paper could also be applied to other turbulence models.
Since the dome experiences the convective heat loading, thermal stress will be generated in the thickness direction. Thus, estimation of the thermal shock and analysis of the thermal shock resistance of the dome are the key to the design of the dome. In this paper, thermal shock resistance of CVD ZnS dome is analysed based on the flight condition of 6000m altitude and 3.0 Mach. We obtained the critical Reynolds number through a rockets pry experiment, which deduced that there exists a transition from laminar flow to turbulent flow at somewhere over the dome. We calculated the heat transfer coefficient over dome through heat transfer coefficient engineering formula of high-speed sphere with turbulent boundary layer near the stagnation point. The largest heat transfer coefficient is 2590W/(m2.K). Then, we calculated the transient thermal stress of dome by using the finite element method. Then we obtained the temperature and thermal stress distribution of different time through the direction of thickness. In order to obtain the mechanical properties of CVD ZnS at high temperatures, the 3-point bending method was used to test the flexure strength of CVD ZnS at different temperature. When compared the maximum thermal stress with flexure strength at different temperature, we find that the safety factors were not less than 1.75. The result implied that the dome has good safety margin under the proposed application condition. Through the above test and analysis, we can get the conclusion that the thermal shock resistance of the CVD ZnS dome satisfied the requirements of flight conditions.
Tone mapping can be used to compress the dynamic range of the image data such that it can be fitted within the range of the reproduction media and human vision. The original infrared images that captured with infrared focal plane arrays (IFPA) are high dynamic images, so tone mapping infrared images is an important component in the infrared imaging systems, and it has become an active topic in recent years. In this paper, we present a tone mapping framework using multi-scale retinex. Firstly, a Conditional Gaussian Filter (CGF) was designed to suppress "halo" effect. Secondly, original infrared image is decomposed into a set of images that represent the mean of the image at different spatial resolutions by applying CGF of different scale. And then, a set of images that represent the multi-scale details of original image is produced by dividing the original image pointwise by the decomposed image. Thirdly, the final detail image is reconstructed by weighted sum of the multi-scale detail images together. Finally, histogram scaling and clipping is adopted to remove outliers and scale the detail image, 0.1% of the pixels are clipped at both extremities of the histogram. Experimental results show that the proposed algorithm efficiently increases the local contrast while preventing “halo” effect and provides a good rendition of visual effect.
Affine invariant feature computing method is an important part of statistical pattern recognition due to the
robustness, repeatability, distinguishability and wildly applicability of affine invariant feature. Multi-Scale
Autoconvolution (MSA) is a transformation proposed by Esa Rathu which can get complete affine invariant feature.
Rathu proved that the linear relationship of any four non-colinear points is affine invariant. The transform is based on a
probabilistic interpretation of the image function. The performance of MSA transform is better on image occlusion and
noise, but it is sensitive to illumination variation. Aim at this problem, an improved MSA transform is proposed in this
paper by computing the map of included angle between N-domain vectors. The proposed method is based on the
probabilistic interpretation of N-domain vectors included angle map. N-domain vectors included angle map is built
through computing the vectors included angle where the vectors are composed of the image point and its N-domain
image points. This is due to that the linear relationship of included angles between vectors composed of any four
non-colinear points is an affine invariance. This paper proves the method can be derived in mathematical aspect. The
transform values can be used as descriptors for affine invariant pattern classification. The main contribution of this
paper is applying the N-domain vectors included angle map while taking the N-domain vector included angle as the
probability of the pixel. This computing method adapts the illumination variation better than taking the gray value of
the pixel as the probability. We illustrate the performance of improved MSA transform in various object classification
tasks. As shown by a comparison with the original MSA transform based descriptors and affine invariant moments, the
proposed method appears to be better to cope with illumination variation, image occlusion and image noise.
Multi-scale 2-D Gaussian filter has been widely used in feature extraction (e.g. SIFT, edge etc.), image segmentation, image enhancement, image noise removing, multi-scale shape description etc. However, their computational complexity remains an issue for real-time image processing systems. Aimed at this problem, we propose a framework of multi-scale 2-D Gaussian filter based on FPGA in this paper. Firstly, a full-hardware architecture based on parallel pipeline was designed to achieve high throughput rate. Secondly, in order to save some multiplier, the 2-D convolution is separated into two 1-D convolutions. Thirdly, a dedicate first in first out memory named as CAFIFO (Column Addressing FIFO) was designed to avoid the error propagating induced by spark on clock. Finally, a shared memory framework was designed to reduce memory costs. As a demonstration, we realized a 3 scales 2-D Gaussian filter on a single ALTERA Cyclone III FPGA chip. Experimental results show that, the proposed framework can computing a Multi-scales 2-D Gaussian filtering within one pixel clock period, is further suitable for real-time image processing. Moreover, the main principle can be popularized to the other operators based on convolution, such as Gabor filter, Sobel operator and so on.
Due to the restriction of infrared imaging component and the radiation of atmosphere, infrared images are discontented with image contrast, blurry, large yawp. Aimed on these problems, a multi-scale image enhancement algorithm is proposed. The main principle is as follows: firstly, On the basis of the multi-scale image decomposition, We use an edge-preserving spatial filter that instead of the Gaussion filter proposed in the original version, adjust the scale-dependent factor With a weighted information. Secondly, contrast is equalized by applying nonlinear amplification. Thirdly, subband image is the weighted sum of sampled subband image and subsampled then upsampled subband image by a factor of two. Finally, Image reconstruction was applied. Experiment results show that the proposed method can enhance the original infrared image effectively and improve the contrast, moreover, it also can reserve the details and edges of the image well.
Point feature and line feature are basic elements in object feature sets, and they play an important role in object matching and recognition. On one hand, point feature is sensitive to noise; on the other hand, there are usually a huge number of point features in an image, which makes it complex for matching. Line feature includes straight line segment and curve. One difficulty in straight line segment matching is the uncertainty of endpoint location, the other is straight line segment fracture problem or short straight line segments joined to form long straight line segment. While for the curve, in addition to the above problems, there is another difficulty in how to quantitatively describe the shape difference between curves. Due to the problems of point feature and line feature, the robustness and accuracy of target description will be affected; in this case, a method of plane geometry primitive presentation is proposed to describe the significant structure of an object. Firstly, two types of primitives are constructed, they are intersecting line primitive and blob primitive. Secondly, a line segment detector (LSD) is applied to detect line segment, and then intersecting line primitive is extracted. Finally, robustness and accuracy of the plane geometry primitive presentation method is studied. This method has a good ability to obtain structural information of the object, even if there is rotation or scale change of the object in the image. Experimental results verify the robustness and accuracy of this method.
Visual tracking is a critical task in many computer vision applications such as surveillance, vehicle tracking, and
motion analysis. The challenges in designing a robust visual tracking algorithm are caused by the presence of
background clutter, occlusion, and illumination changes. In this paper, we propose a visual tracking algorithm in a
particle filter framework to overcome these three challenging issues. Particle filter is an inference technique for
estimating the unknown motion state from a noisy collection of observations, so we employ particle filter to learn the
trajectory of a target. The proposed algorithm depends on the learned trajectory to predict the position of a target at a
new frame, and corrects the predication by a process that can be entitled field transition. At the beginning of the tracking
stage, a set of disturbance templates around the target template are accurately selected and defined as particles. During
tracking, a position of the tracked target is firstly predicted based on the learned motion state, and then we take the
normalized cross-correlation coefficient as a level to select the most suitable field transition parameters of the predicted
position from the corresponding parameters of the particles. After judging the target is not occluded, we apply the named
field transition with the selected parameters to compensate the predicted position to the accurate location of the target,
meanwhile, we make use of the calculated cross-correlation coefficient as a posterior knowledge to update the weights of
all the particles for the next prediction. In order to evaluate the performance of the proposed tracking algorithm, we test
the approach on challenging sequences involving heavy background clutter, severe occlusions, and drastic illumination
changes. Comparative experiments have demonstrated that this method makes a more significant improvement in
efficiency and accuracy than two previously proposed algorithms: the mean shift tracking algorithm (MS) and the
covariance tracking algorithm (CT).
Histogram of oriented gradient (HOG) is an efficient feature extraction scheme, and HOG descriptors are feature
descriptors which is widely used in computer vision and image processing for the purpose of biometrics, target tracking,
automatic target detection(ATD) and automatic target recognition(ATR) etc. However, computation of HOG feature
extraction is unsuitable for hardware implementation since it includes complicated operations. In this paper, the optimal
design method and theory frame for real-time HOG feature extraction based on FPGA were proposed. The main
principle is as follows: firstly, the parallel gradient computing unit circuit based on parallel pipeline structure was
designed. Secondly, the calculation of arctangent and square root operation was simplified. Finally, a histogram generator
based on parallel pipeline structure was designed to calculate the histogram of each sub-region. Experimental results
showed that the HOG extraction can be implemented in a pixel period by these computing units.
This paper proposes a difference-templates based target tracking method (DTBTTM) with the originality of constructing
a collection of difference templates that represent the varying characteristics of target region, such as translation, scale,
and illumination. DTBTTM method uses the linear combination of such difference templates to represent the variation of
target region, and computes coefficients with respect to the corresponding templates. The final target position and
window size can be determined with these coefficients. DTBTTM method simply solves linear equations, and is quite
different from correlation method in which 2-dimensional search is required to calculate similarity between pre-defined
template and the region of interest. Experimental results show that the DTBTTM is highly adaptable to the variation of
target region, and is robust to the variation of translation, scale, illumination, and even occlusion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.