Paper
16 August 2023 Text error correction after text recognition based on MacBERT4CSC
Yanzhi Guan, Ziliang Pang, Yehui Ding, Longlong Tian
Author Affiliations +
Proceedings Volume 12787, Sixth International Conference on Advanced Electronic Materials, Computers, and Software Engineering (AEMCSE 2023); 127872K (2023) https://doi.org/10.1117/12.3004939
Event: 6th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE 2023), 2023, Shenyang, China
Abstract
With the wide application of text recognition technology, the text recognition model's own limitations and environmental factors interfere with the recognition error rate is high. To address the above situation, a text error correction method based on MacBERT4CSC after text recognition is proposed. Firstly, using the single-word text recognition confidence, we obtain the suspected wrong words and positions by setting thresholds for error detection of the text to be corrected, and then iterate through the constructed confusion set for text error correction in priority, and after the traversal is completed, the MacBERT4CSC model recalls the candidate words with suspected wrong positions, and finally, after the similarity of the word code and the MacBERT4CSC model score meet the set conditions, The error correction is finally completed by sorting the candidate words that meet the conditions. By imitating the error type of word recognition to build the dataset for word recognition and calling Ali's word recognition API for word recognition to obtain the test set to be corrected, the comparison experiments show that the MacBERT4CSC recall and word bar code sorting method improves in accuracy, recall rate and F1 value compared with other deep models and traditional rule error correction methods, which verifies the effectiveness of the method.
(2023) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Yanzhi Guan, Ziliang Pang, Yehui Ding, and Longlong Tian "Text error correction after text recognition based on MacBERT4CSC", Proc. SPIE 12787, Sixth International Conference on Advanced Electronic Materials, Computers, and Software Engineering (AEMCSE 2023), 127872K (16 August 2023); https://doi.org/10.1117/12.3004939
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Error control coding

Error analysis

Optical character recognition

Statistical modeling

Data modeling

Image processing

Target recognition

Back to Top