The dock target in remote sensing images has the characteristics of slender structure and direction arbitrarily. The general target detection algorithm based on the convolutional neural network cannot effectively obtain the direction information of the target, which cannot meet the actual demand of dock detection. This study designed a deep convolutional neural network architecture in any direction based on the YOLOv4 algorithm aimed at resolving the above problems. First, the multidimensional coordinate method was used to calibrate the dock target so that the network could contain the direction information of the target. Second, the loss function of the algorithm was optimized to make it suitable for directional target detection. Finally, an attention mechanism was introduced to enhance the extraction ability of the algorithm and further improve its detection accuracy. Two datasets of dock target detection from remote sensing images were selected for experiments, and the results showed that the improved YOLOv4 network was better than the other networks in the dock target detection task.
Scene classification is an important tool for remote sensing image interpretation, and it has fundamental applications in research and industry. However, given complex backgrounds and scale variations, remote sensing images have large intraclass diversity and interclass similarity, which bring challenges to accurate classification of remote sensing images. We proposed a scene classification method using joint learning and multiscale attention to alleviate the aforementioned problems. To fully utilize the multiscale information of the image and improve the adaptability of the proposed method to objects with various sizes, different from general methods that fuse different scales of features for classification, joint learning using multiscale features is developed to optimize the whole network. Specifically, we leverage a pretrained deep convolutional neural network as the feature extractor to extract low-level, medium-level, and high-level feature maps from the images. Then, due to the poor semantics of low-level and medium-level feature maps compared with the high-level feature maps, we design a multiscale attention module to enhance the semantic information and suppress the noise information. Finally, the global mean pooling is used to obtain the feature vectors and different classifiers are used for different feature vectors. And the decision-level fusion is adopted to obtain more reliable predictions. The experimental results on the AID and NWPU-RESISC45 datasets show that the proposed method makes a significant improvement in terms of overall accuracies compared with the baselines. And the overall accuracies of our method on the two datasets are 97.49% and 95.20%, respectively, which achieves state-of-the-art performance. The code will be public at a Github repository available at https://github.com/Cbanyungong/JLMSAF.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.