分类: 数学 >> 建模与仿真 分类: 语言学及应用语言学 >> 语言学及应用语言学 分类: 计算机科学 >> 自然语言理解与机器翻译 提交时间: 2025-06-03
摘要: Natural language is considered closely intertwined with human cognition, with linguistic structures posited to offer profound insights into the cognitive system. However, as a coding system, natural language encodes diverse objects into unified forms; its prominent formal features capture people’s attention, such as lexical combinatorial rules, which tend to overshadow those form-independent structures. Here, I present knowledge-level, logic-level, task-level, and model-level semantic structures inherent in natural language. These structures are discovered by shifting the research focus from coding forms of natural language to the objects they encode, unveiling different semantic layers integrated within sentences. The cognitive functions of these structures are evident both in themselves and in models developed from them. I therefore introduce four models to demonstrate their capabilities in memorization, reasoning, learning, natural language generation, and understanding. These findings advance our understanding of natural language and provide a framework for investigating the cognitive system’s information processing through structural analysis of natural language.
分类: 物理学 >> 普通物理:统计和量子力学,量子信息等 分类: 计算机科学 >> 自然语言理解与机器翻译 提交时间: 2025-05-27
摘要: The introduction of the transformer architecture in 2017 (cf. [VSP2017]) marked the most striking advancement in natural language processing. The transformer is a model architecture relying entirely on an attention mechanism to draw global dependencies between input and output. However, we believe there is a gap in our theoretical understanding of what the transformer is, and why it works physically. In this paper, from a physical perspective on modern chips, we construct physical models in the Fock space over the Hilbert space of tokens realizing large language models based on a transformer architecture as open quantum systems. Our physical models underlie the transformer architecture for large language models.
分类: 计算机科学 >> 自然语言理解与机器翻译 分类: 计算机科学 >> 计算机软件 分类: 计算机科学 >> 计算机应用技术 提交时间: 2025-05-20
摘要: Domain-specific QA systems require not just generative fluency but high factual accuracy grounded in structured expert knowledge. While recent Retrieval-Augmented Generation (RAG) frameworks improve context recall, they struggle with integrating heterogeneous data and maintaining reasoning consistency. To address these challenges, we propose DO-RAG, a scalable and customizable hybrid QA framework that integrates multi-level knowledge graph construction with semantic vector retrieval. Our system employs a novel agentic chain-of-thought architecture to extract structured relationships from unstructured, multimodal documents, constructing dynamic knowledge graphs that enhance retrieval precision. At query time, DO-RAG fuses graph and vector retrieval results to generate context-aware responses, followed by hallucination mitigation via grounded refinement. Experimental evaluations in the database and electrical domains show near-perfect recall and over 94% answer relevancy, with DO-RAG outperforming baseline frameworks by up to 33.38%. By combining traceability, adaptability, and performance efficiency, DO-RAG offers a reliable foundation for multi-domain, high-precision QA at scale.
分类: 物理学 >> 普通物理:统计和量子力学,量子信息等 分类: 计算机科学 >> 自然语言理解与机器翻译 提交时间: 2025-05-07
摘要: This paper presents a mathematical formalism for generative artificial intelligence (GAI). Our starting point is an observation that a “histories" approach to physical systems agrees with the compositional nature of deep neural networks. Mathematically, we define a GAI system as a family of sequential joint probabilities associated with input texts and temporal sequences of tokens (as physical event histories as in \cite{Gudder1998,Isham1994}). From a physical perspective on modern chips, we then construct physical models realizing GAI systems as open quantum systems. Finally, as illustration, we construct physical models in the Fock space over the Hilbert space of tokens realizing large language models based on a transformer architecture as open quantum systems.
分类: 计算机科学 >> 自然语言理解与机器翻译 提交时间: 2024-12-25
摘要: Large language models (LLMs), adopted to understand human language, drive the development of artificial intelligence (AI) web search agents. Compared to traditional search engines, LLM-powered AI search agents are capable of understanding and responding to complex queries with greater depth, enabling more accurate operations and better context recognition. However, little attention and effort has been paid to the Chinese web search, which results in that the capabilities of open-source models have not been uniformly and fairly evaluated. The difficulty lies in lacking three aspects: an unified agent framework, an accurately labeled dataset, and a suitable evaluation metric. To address these issues, we propose a general-purpose and training-free web search agent by level-aware navigation, Level-Navi Agent, accompanied by a well-annotated dataset (Web24) and a suitable evaluation metric. Level-Navi Agent can think through complex user questions and conduct searches across various levels on the internet to gather information for questions. Meanwhile, we provide a comprehensive evaluation of state-of-the-art LLMs under fair settings. To further facilitate future research, source code is available at Github.
分类: 计算机科学 >> 自然语言理解与机器翻译 提交时间: 2024-11-25
摘要: Large language models (LLMs) have showcased exceptional capabilities across various natural language processing (NLP) tasks in recent years, such as machine translation, text summarization, and question answering. Despite their impressive performance, the deployment of these models on edge devices, such as mobile phones, IoT devices, and edge computing nodes, is significantly hindered by their substantial computational and memory requirements. This survey provides a comprehensive overview of the state-of-the-art techniques and strategies for enabling efficient inference of LLMs on edge devices. We explore approaches including the development of small language models (SLMs), model compression techniques, inference optimization strategies, and dedicated frameworks for edge deployment. Our goal is to highlight the advancements and ongoing challenges in this field, offering valuable insights for researchers and practitioners striving to bring the power of LLMs to edge environments.
分类: 生物学 >> 生物学其他学科 分类: 计算机科学 >> 计算机应用技术 提交时间: 2024-11-12
摘要: Paleontology, the study of past life, fundamentally relies on fossils to reconstruct ancient ecosystems and understand evolutionary dynamics. Trilobites, as an important group of extinct marine arthropods, offer valuable insights into Paleozoic environments through their well-preserved fossil records. Reconstructing trilobite behaviour from static fossils will set new standards for dynamic reconstructions in scientific research and education. Despite the potential, current computational methods for this purpose like text-to-video (T2V) face significant challenges, such as maintaining visual realism and consistency, which hinder their application in science contexts. To overcome these obstacles, we introduce an automatic T2V prompt learning method. Within this framework, prompts for a fine-tuned video generation model are generated by a large language model, which is trained using rewards that quantify the visual realism and smoothness of the generated video. The fine-tuning of the video generation model, along with the reward calculations make use of a collected dataset of 9,088 Eoredlichia intermedia fossil images, which provides a common representative of visual details of all class of trilobites. Qualitative and quantitative experiments show that our method can generate trilobite videos with significantly higher visual realism compared to powerful baselines, promising to boost both scientific understanding and public engagement.
分类: 计算机科学 >> 计算机应用技术 提交时间: 2024-08-05
摘要: The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond, with the segment anything model (SAM) having sparked a passion for exploring task-agnostic visual foundation models. Empowered by its remarkable zero-shot generalization, SAM is currently challenging numerous traditional paradigms in CV, delivering extraordinary performance not only in various image segmentation and multi-modal segmentation (e.g., text-to-mask) tasks, but also in the video domain. Additionally, the latest released SAM 2 is once again sparking research enthusiasm in the realm of promptable visual segmentation for both images and videos. However, existing surveys mainly focus on SAM in various image processing tasks, a comprehensive and in-depth review in the video domain is notably absent. To address this gap, this work conducts a systematic review on SAM for videos in the era of foundation models. As the first to review the progress of SAM for videos, this work focuses on its applications to various tasks by discussing its recent advances, and innovation opportunities of developing foundation models on broad applications. We begin with a brief introduction to the background of SAM and video-related research domains. Subsequently, we present a systematic taxonomy that categorizes existing methods into three key areas: video understanding, video generation, and video editing, analyzing and summarizing their advantages and limitations. Furthermore, comparative results of SAM-based and current state-of-the-art methods on representative benchmarks, as well as insightful analysis are offered. Finally, we discuss the challenges faced by current research and envision several future research directions in the field of SAM for video and beyond.
分类: 计算机科学 >> 信息安全 分类: 计算机科学 >> 计算机应用技术 提交时间: 2024-04-23
摘要: Image steganography has become a focal point of interest for researchers due to its capacity for the covert transmission of sensitive data. Traditional diffusion models often struggle with image steganography tasks involving paired data, as their core principle of gradually removing noise is not directly suited for maintaining the correspondence between carrier and secret information. To address this challenge, this paper conducts an in-depth analysis of the principles behind diffusion models and proposes a novel framework for an image steganography diffusion model. The study begins by mathematically representing the steganography tasks of paired images, introducing two optimization objectives: minimizing the secrecy leakage function and embedding distortion function. Subsequently, it identifies three key issues that need to be addressed in paired image steganography tasks and, through specific constraint mechanisms and optimization strategies, enables the diffusion model to effectively handle paired data. This enhances the quality of the generated stego-images and resolves issues such as image clarity. Finally, on public datasets like CelebA, the proposed model is compared with existing generation model-based image steganography techniques, analyzing its implementation effects and performance parameters. Experimental results indicate that, compared to current technologies, the model framework proposed in this study not only improves image quality but also achieves significant enhancements in multiple performance metrics, including the imperceptibility and anti-detection capabilities of the images. Specifically, the PSNR of its stego-images reaches 93.14dB, and the extracted images’ PSNR reaches 91.23dB, an approximate improvement of 30% over existing technologies; the attack success rate is reduced to 2.4x10-38. These experimental outcomes validate the efficacy and superiority of the method in image steganography tasks.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2024-03-16
摘要: A novel federated learning training framework for heterogeneous environments is presented, taking into account the diverse network speeds of clients in realistic settings. This framework integrates asynchronous learning algorithms and pruning techniques, effectively addressing the inefficiencies of traditional federated learning algorithms in scenarios involving heterogeneous devices, as well as tackling the staleness issue and inadequate training of certain clients in asynchronous algorithms. Through the incremental restoration of model size during training, the framework expedites model training while preserving model accuracy. Furthermore, enhancements to the federated learning aggregation process are introduced, incorporating a buffering mechanism to enable asynchronous federated learning to operate akin to synchronous learning. Additionally, optimizations in the process of the server transmitting the global model to clients reduce communication overhead. Our experiments across various datasets demonstrate that: (i) significant reductions in training time and improvements in convergence accuracy are achieved compared to conventional asynchronous FL and HeteroFL; (ii) the advantages of our approach are more pronounced in scenarios with heterogeneous clients and non-IID client data.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-28 合作期刊: 《数据智能(英文)》
摘要: COVID-19 evolves rapidly and an enormous number of people worldwide desire instant access to COVID- 19 information such as the overview, clinic knowledge, vaccine, prevention measures, and COVID-19 mutation. Question answering (QA) has become the mainstream interaction way for users to consume the ever-growing information by posing natural language questions. Therefore, it is urgent and necessary to develop a QA system to offer consulting services all the time to relieve the stress of health services. In particular, people increasingly pay more attention to complex multi-hop questions rather than simple ones during the lasting pandemic, but the existing COVID-19 QA systems fail to meet their complex information needs. In this paper, we introduce a novel multi-hop QA system called COKG-QA, which reasons over multiple relations over large-scale COVID-19 Knowledge Graphs to return answers given a question. In the field of question answering over knowledge graph, current methods usually represent entities and schemas based on some knowledge embedding models and represent questions using pre-trained models. While it is convenient to represent different knowledge (i.e., entities and questions) based on specified embeddings, an issue raises that these separate representations come from heterogeneous vector spaces. We align question embeddings with knowledge embeddings in a common semantic space by a simple but effective embedding projection mechanism. Furthermore, we propose combining entity embeddings with their corresponding schema embeddings which served as important prior knowledge, to help search for the correct answer entity of specified types. In addition, we derive a large multi-hop Chinese COVID-19 dataset (called COKG-DATA for remembering) for COKG-QA based on the linked knowledge graph OpenKG-COVID19 launched by OpenKG, including comprehensive and representative information about COVID-19. COKG-QA achieves quite competitive performance in the 1-hop and 2-hop data while obtaining the best result with significant improvements in the 3-hop. And it is more efficient to be used in the QA system for users. Moreover, the user study shows that the system not only provides accurate and interpretable answers but also is easy to use and comes with smart tips and suggestions.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》
摘要: Medical named entity recognition (NER) is an area in which medical named entities are recognized from medical texts, such as diseases, drugs, surgery reports, anatomical parts, and examination documents. Conventional medical NER methods do not make full use of un-labelled medical texts embedded in medical documents. To address this issue, we proposed a medical NER approach based on pre-trained language models and a domain dictionary. First, we constructed a medical entity dictionary by extracting medical entities from labelled medical texts and collecting medical entities from other resources, such as the Yidu#2; N4K data set. Second, we employed this dictionary to train domain-specific pre-trained language models using un-labelled medical texts. Third, we employed a pseudo labelling mechanism in un-labelled medical texts to automatically annotate texts and create pseudo labels. Fourth, the BiLSTM-CRF sequence tagging model was used to fine-tune the pre-trained language models. Our experiments on the un-labelled medical texts, which were extracted from Chinese electronic medical records, show that the proposed NER approach enables the strict and relaxed F1 scores to be 88.7% and 95.3%, respectively.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》
摘要: Computational prediction of in-hospital mortality in the setting of an intensive care unit can help clinical practitioners to guide care and make early decisions for interventions. As clinical data are complex and varied in their structure and components, continued innovation of modelling strategies is required to identify architectures that can best model outcomes. In this work, we trained a Heterogeneous Graph Model (HGM) on electronic health record (EHR) data and used the resulting embedding vector as additional information added to a Convolutional Neural Network (CNN) model for predicting in-hospital mortality. We show that the additional information provided by including time as a vector in the embedding captured the relationships between medical concepts, lab tests, and diagnoses, which enhanced predictive performance. We found that adding HGM to a CNN model increased the mortality prediction accuracy up to 4%. This framework served as a foundation for future experiments involving different EHR data types on important healthcare prediction tasks.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-27 合作期刊: 《数据智能(英文)》
摘要: In this paper we present the results of the Interactive Argument-Pair Extraction in Judgement Document Challenge held by both the Chinese AI and Law Challenge (CAIL) and the Chinese National Social Media Processing Conference (SMP), and introduce the related data set SMP-CAIL2020-Argmine. The task challenged participants to choose the correct argument among five candidates proposed by the defense to refute or acknowledge the given argument made by the plaintiff, providing the full context recorded in the judgement documents of both parties. We received entries from 63 competing teams, 38 of which scored higher than the provided baseline model (BERT) in the first phase and entered the second phase. The best performing system in the two phases achieved accuracy of 0.856 and 0.905, respectively. In this paper, we will present the results of the competition and a summary of the systems, highlighting commonalities and innovations among participating systems. The SMP-CAIL2020-Argmine data set and baseline models have been already released.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-25 合作期刊: 《数据智能(英文)》
摘要: We study the problem of the unsupervised learning of graphical models in mixed discrete-continuous domains. The problem of unsupervised learning of such models in discrete domains alone is notoriously challenging, compounded by the fact that inference is computationally demanding. The situation is generally believed to be significantly worse in discrete-continuous domains: estimating the unknown probability distribution of given samples is often limited in practice to a handful of parametric forms, and in addition to that, computing conditional queries need to carefully handle low-probability regions in safety-critical applications. In recent years, the regime of tractable learning has emerged, which attempts to learn a graphical model that permits efficient inference. Most of the results in this regime are based on arithmetic circuits, for which inference is linear in the size of the obtained circuit. In this work, we show how, with minimal modifications, such regimes can be generalized by leveraging efficient density estimation schemes based on piecewise polynomial approximations. Our framework is realized on a recent computational abstraction that permits efficient inference for a range of queries in the underlying language. Our empirical results show that our approach is effective, and allows a study of the trade-off between the granularity of the learned model and its predictive power.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: The FAIR data guiding principles have been recently developed and widely adopted to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets in the face of an exponential increase of data volume and complexity. The FAIR data principles have been formulated on a general level and the technological implementation of these principles remains up to the industries and organizations working on maximizing the value of their data. Here, we describe the data management and curation methodologies and best practices developed for FAIRification of clinical exploratory biomarker data collected from over 250 clinical studies. We discuss the data curation effort involved, the resulting output, and the business and scientific impact of our work. Finally, we propose prospective planning for FAIR data to optimize data management efforts and maximize data value.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: Currently, as a basic task of military document information extraction, Named Entity Recognition (NER) for military documents has received great attention. In 2020, China Conference on Knowledge Graph and Semantic Computing (CCKS) and System Engineering Research Institute of Academy of Military Sciences (AMS) issued the NER task for test evaluation, which requires the recognition of four types of entities including Test Elements (TE), Performance Indicators (PI), System Components (SC) and Task Scenarios (TS). Due to the particularity and confidentiality of the military field, only 400 items of annotated data are provided by the organizer. In this paper, the task is regarded as a few-shot learning problem for NER, and a method based on BERT and two-level model fusion is proposed. Firstly, the proposed method is based on several basic models fine tuned by BERT on the training data. Then, a two-level fusion strategy applied to the prediction results of multiple basic models is proposed to alleviate the over-fitting problem. Finally, the labeling errors are eliminated by post-processing. This method achieves F1 score of 0.7203 on the test set of the evaluation task.
分类: 计算机科学 >> 计算机科学的集成理论 提交时间: 2022-11-18 合作期刊: 《数据智能(英文)》
摘要: Modern information systems require the orchestration of ontologies, conceptual data modeling techniques, and efficient data management so as to provide a means for better informed decision-making and to keep up with new requirements in organizational needs. A major question in delivering such systems, is which components to design and put together to make up the required knowledge to data pipeline, as each component and process has trade-offs. In this paper, we introduce a new knowledge-to-data architecture, KnowID. It pulls together both recently proposed components and we add novel transformation rules between Enhanced Entity-Relationship (EER) and the Abstract Relational Model to complete the pipeline. KnowIDs main distinctive architectural features, compared to other ontology-based data access approaches, are that runtime use can avail of the closed world assumption commonly used in information systems and of full SQL augmented with path queries.
分类: 计算机科学 >> 计算机应用技术 提交时间: 2022-04-18
摘要: In the self-service baggage check and sorting of civil aviation, it is an essential function to automatically detect whether pallets is added to the self-dropped baggage, but the pallets are largely obscured by the embedded baggage,whichbecomes a challenging problem.For this issue, a fast detection method for embedded baggagepallets based on multi-layer skeleton model registrationis proposed. In order to describe the characteristics of the pallet, the point cloud skeleton model and the point-line model are constructed by the 3D point cloud model. During online detection, the designed banded feature description and extraction method is used to grab the border point cloud, and the proposed point-line potential energy iterative algorithm is used to registration the point-line model and horizontal border points. Then, point cloud iterative nearest point registration based on random sampling consistency is used to achieve accurate registration and pose calculation.As a result, the possibility of the existence of the palletis judged. The effectiveness of the algorithm is verified by a variety of actual pallet detection experiments. A variety of typical comparative experimental results show that when the palletpoint cloud is missing within 70%, the algorithm can still maintain the accuracy of 94%and the speed exceeds typical algorithm is more than 6 times.
分类: 计算机科学 >> 计算机应用技术 提交时间: 2020-04-14
摘要: Low-light images suffer from severe noise and low illumination. Current deep learning models that are trained with real-world images have excellent noise reduction, but a ratio parameter must be chosen manually to complete the enhancement pipeline. In this work, we propose an adaptive low-light raw image enhancement network to avoid parameter-handcrafting and to improve image quality. The proposed method can be divided into two sub-models: Brightness Prediction (BP) and Exposure Shifting (ES). The former is designed to control the brightness of the resulting image by estimating a guideline exposure time t 1 . The latter learns to approximate an exposure-shifting operator ES, converting a low-light image with real exposure time t 0 to a noise-free image with guideline exposure time t 1 . Additionally, structural similarity (SSIM) loss and Image Enhancement Vector (IEV) are introduced to promote image quality, and a new Campus Image Dataset (CID) is proposed to overcome the limitations of the existing datasets and to supervise the training of the proposed model. In quantitative tests, it is shown that the proposed method has the lowest Noise Level Estimation (NLE) score compared with BM3D-based low-light algorithms, suggesting a superior denoising performance. Furthermore, those tests illustrate that the proposed method is able to adaptively control the global image brightness according to the content of the image scene. Lastly, the potential application in video processing is briefly discussed.