Abstract:
[Purpose/Significance]This study aims to explore ancient Chinese texts using Named Entity Recognition (NER) technology, promote the digitization of ancient Chinese texts, facilitate the extraction and analysis of important information, enhance the acquisition and understanding of cultural heritage, and promote traditional culture. [Method/Process]We propose a method for NER in ancient Chinese texts that integrates large language models with multimodal features. First, we utilize a large language model for data augmentation to generate richer samples. Then, we segment the text into fixed-length subsequences using a sliding window approach and input these subsequences into an encoding layer to obtain feature representations of the text. Convolutional Neural Networks (CNN) are employed to extract local features of the character shapes, and an improved Iterative Dilated Convolutional Neural Network (IDCNN) is used to capture long-range features, thereby obtaining global information of the character shapes. Finally, the text features and shape features are concatenated at a feature perception layer to form a comprehensive representation for each character, and the concatenated comprehensive features are passed to a CRF layer for sequence labeling to complete entity prediction. Using "Zuo Zhuan" and CHED_NER as the research corpus, we constructed tasks for identifying named entities such as personal names, geographical names, and temporal expressions. [Result/Conclusion]Experimental results show that the ancient Chinese text named entity recognition method that integrates large language models and multimodal features has improved F1 values by 13.32% and 1.03% respectively compared to the mainstream BERT-BiLSTM-CRF method.The proposed method for NER in ancient Chinese texts, which integrates large language models with multimodal features, can accurately achieve named entity recognition in ancient Chinese texts.