The classification of high-resolution remote sensing images finds widespread applications, yet achieving accurate classification of land types often relies heavily on labeled samples. However, obtaining labeled samples is a challenging and time-consuming task. To mitigate the algorithm’s dependence on labeled samples and reduce computational time and resource consumption, we propose an approach combining a context-aggregated transformer-based network (TSNet) with a differentiable feature clustering method for unsupervised remote sensing image classification. The algorithm comprises an end-to-end network model consisting of a feature extractor (TSNet), classifier, and argmax function. It introduces a loss function that incorporates feature similarity loss and spatial continuity loss of differentiable feature clustering to address the limitations of fixed boundaries, eliminating the need for training data. TSNet combines the strengths of both convolutional neural network (CNN) and transformer. It utilizes the transformer module to obtain hierarchical features, employs a CNN decoder to aggregate context, and integrates a multi-branch prediction header with classifiers from three CNNs to generate the classification map, enhancing supervision in deeper layers. The proposed approach has been tested on a public dataset (LoveDA), and compared with UNet-Net, CNN-Net, KMeans, and iterative self-organizing data analysis technique algorithm, the experimental results show that our method achieves the best results in terms of adjusted Rand index, adjusted mutual information, FMI, Dice, Jaccard, and frequency weighted intersection over union metrics. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
CITATIONS
Cited by 2 scholarly publications.
Image classification
Remote sensing
Feature extraction
Transformers
Education and training
Image processing
RGB color model