Scene text image super-resolution aims to provide high-resolution and readable text images to support scene text recognition. Although existing methods based on deep learning have made significant progress, shallow information during image super-resolution is often ignored as the depth of the neural network increases. We propose a multi-level text feature enhancement super-resolution network (TESRN) to address this issue. TESRN adopts a coarse-to-fine feature extraction methodology, mainly including a shallow feature enhancement block (SFEB) and a multi-level feature fusion and extraction block (MFFEB). In SFEB, we design a framework based on wavelet transform for extracting coarse high-frequency signals. This framework works in parallel with convolution to accomplish shallow extraction. In MFFEB, we propose sequential group convolution blocks (SGCBs) based on group convolution and attention mechanism. Multi-level text features are generated step by step through stacking SGCBs. To comprehensively capture text features, we introduce a bottleneck attention mechanism (BAM) to execute feature selection in spatial and channel dimensions. BAM helps in selecting the most relevant features for text restoration. Finally, we conduct extensive experiments on the TextZoom dataset to evaluate the performance of TESRN. The results demonstrate that TESRN achieves high-quality image restoration and significantly improves the recognition accuracy of low-resolution text images in downstream text recognition tasks. Notably, our model outperforms existing methods in terms of recognition accuracy on the easy test subset. This further validates that TESRN effectively utilizes the shallow features of text images, emphasizing the crucial role of shallow features in text reconstruction. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Convolution
Feature extraction
Super resolution
Image restoration
Discrete wavelet transforms
Performance modeling
Image processing