Named Entity Recognition for Ancient Chinese Texts Using LLMs and Multimodal Features

Author: Meng Jiana ¹ Li Fengyi ¹ Liu Shuang ¹ Zhao Di ¹ Wang Bolin ¹
Institute:

1. Danlian Minzu University
Correspondent： 孟佳娜 Email:mengjn@dlnu.edu.cn
Submit Time:2024-11-18 11:21:12

Abstract: [Purpose/Significance]This study aims to explore ancient Chinese texts using Named Entity Recognition (NER) technology, promote the digitization of ancient Chinese texts, facilitate the extraction and analysis of important information, enhance the acquisition and understanding of cultural heritage, and promote traditional culture. [Method/Process]We propose a method for NER in ancient Chinese texts that integrates large language models with multimodal features. First, we utilize a large language model for data augmentation to generate richer samples. Then, we segment the text into fixed-length subsequences using a sliding window approach and input these subsequences into an encoding layer to obtain feature representations of the text. Convolutional Neural Networks (CNN) are employed to extract local features of the character shapes, and an improved Iterative Dilated Convolutional Neural Network (IDCNN) is used to capture long-range features, thereby obtaining global information of the character shapes. Finally, the text features and shape features are concatenated at a feature perception layer to form a comprehensive representation for each character, and the concatenated comprehensive features are passed to a CRF layer for sequence labeling to complete entity prediction. Using "Zuo Zhuan" and CHED_NER as the research corpus, we constructed tasks for identifying named entities such as personal names, geographical names, and temporal expressions. [Result/Conclusion]Experimental results show that the ancient Chinese text named entity recognition method that integrates large language models and multimodal features has improved F1 values by 13.32% and 1.03% respectively compared to the mainstream BERT-BiLSTM-CRF method.The proposed method for NER in ancient Chinese texts, which integrates large language models with multimodal features, can accurately achieve named entity recognition in ancient Chinese texts.

Ancient Chinese Texts Entity Recognition Iterative Dilated Convolutional Neural Network Large Language Model Feature Fusion

From: 李丰毅
Subject: Computer Science >> Natural Language Understanding and Machine Translation
Contribution： Accepted
Cite as: ChinaXiv:202411.00196 (or this version ChinaXiv:202411.00196V2)
DOI:10.12074/202411.00196
CSTR:32003.36.ChinaXiv.202411.00196
TXID： b9fe4832-80e0-4d2c-96a7-78f3e358a83f
Recommended references： 孟佳娜,李丰毅,刘爽,赵迪,王博林.融合大语言模型与多模态特征的古文命名实体识别.null.[DOI:10.12074/202411.00196] (Click&Copy)

Version History

[V3]	2024-11-20 09:59:08	ChinaXiv:202411.00196v3 View This Version	Download
[V2]	2024-11-18 11:21:12	ChinaXiv:202411.00196V2	Download
[V1]	2024-11-15 10:04:28	ChinaXiv:202411.00196v1 View This Version	Download

Related Paper

1. CREA-Eval：用于测试大语言模型理解稀土领域相关问题能力的评估基准	2026-04-13
2. 大语言模型驱动的科学假设生成研究综述	2026-04-02
3. 面向矢量图形生成的大语言模型研究综述	2026-01-08
4. 从概念识别到自动化测量：基于大语言模型的国家刻板印象评估	2025-03-11
5. 引导大语言模型生成计算机可解析内容	2024-04-21
6. LLAMA-2 大语言模型的数学形式	2023-08-31
7. 大语言模型旋转位置编码的简易推导	2023-07-12


Public comments Anonymous comments Send only to author