ChinaXiv.org 中国科学院科技论文预发布平台

Submitted Date

Subjects

Authors

Institution

result total 1,872.

Hide Summary

Hits

Date

Downloads

1. ChinaXiv:202507.00324
Download

Trusted verification algorithm with information-theoretic security

Subjects: Computer Science >> Information Security submitted time 2025-07-17

Zhen-hu Ning

Abstract： Cryptography is the cornerstone of trusted computing. At present, classical Cryptography has been seriously challenged by quantum computing. Although Post-Quantum Cryptography (PQC) can resist quantum attacks, with the development and maturity of quantum computing, the emergence of new quantum attack will be inevitable. Whether the security of the PQC algorithm can be effective in the long run is unknown. Based on this, we propose a trusted verification algorithm with information-theoretic security, which is designed using modular operations. Its security is a direct inference based on mathematical theories and does not rely on any difficult problem assumptions. Therefore, trusted verification algorithm can resist quantum attacks completely.

Peer Review Status:Awaiting Review

Hits 84 Downloads 16 Comment 0
2. ChinaXiv:202507.00320
Download

A Survey on Named Entity Recognition: Methods and Developments

Subjects: Computer Science >> Natural Language Understanding and Machine Translation submitted time 2025-07-16

Jiansheng Rao

Abstract： Named Entity Recognition (NER) is a key component of natural language processing (NLP) systems and has been widely applied in tasks such as question answering, information retrieval, and relation extraction. Although NER systems have undergone decades of research and development, the application of deep neural networks (DNNs) to NER has only emerged in recent years. This survey provides a comprehensive overview of the application of deep neural network architectures in NER and compares these approaches with traditional NER methods based on feature engineering, as well as other supervised and semi-supervised learning algorithms. In addition, this work elaborates on several representative neural network-based models that have been frequently adopted in NER tasks in recent years, including LEBERT, SpanKL, MFME-NER, BERT-CRF, and FLAT.

Peer Review Status:Awaiting Review

Hits 57 Downloads 10 Comment 0
3. ChinaXiv:202507.00322
Download

Research on the evolution of reconfigurable chip technology

Subjects: Engineering and technical science >> Technology of Instrument and Meter Subjects: Computer Science >> Computer Application Technology submitted time 2025-07-15

zhangzhen

Abstract： research on the evolution of reconflgurable chip technology

Peer Review Status:Awaiting Review

Hits 42 Downloads 9 Comment 0
4. ChinaXiv:202505.00162
Download

Information Science Principles of Machine Learning: A Causal Chain Meta-Framework Based on Formalized Information Mapping

Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2025-07-14

Xu Jianfeng

Abstract： [Objective] This study focuses on addressing the current lack of a unified formal theoretical framework in machine learning, as well as the deficiencies in interpretability and ethical safety assurance.
[Methods] A formal information model is first constructed, utilizing sets of well-formed formulas to explicitly define the ontological states and carrier mappings of typical components in machine learning. Learnable and processable predicates, along with learning and processing functions, are introduced to analyze the logical deduction and constraint rules of the causal chains within models.
[Results] A meta-framework for machine learning theory (MLT-MF) is established. Based on this framework, universal definitions for model interpretability and ethical safety are proposed. Furthermore, three key theorems are proved: the equivalence of model interpretability and information recoverability, the assurance of ethical safety, and the estimation of generalization error.
[Limitations] The current framework assumes ideal conditions with noiseless information-enabling mappings and primarily targets model learning and processing logic in static scenarios. It does not yet address information fusion and conflict resolution across ontological spaces in multimodal or multi-agent systems.
[Conclusions] This work overcomes the limitations of fragmented research and provides a unified theoretical foundation for systematically addressing the critical challenges currently faced in machine learning.

Peer Review Status:Awaiting Review

Hits 1582 Downloads 215 Comment 0
5. ChinaXiv:202507.00101
Download

A Survey Report on Natural Language Processing and Its Core Technologies

Subjects: Computer Science >> Natural Language Understanding and Machine Translation submitted time 2025-07-09

Jiansheng Rao

Abstract： The advancement of technology and the development of the internet have made the storage and distribution of large-scale unstructured data, such as audio, video, and natural language text, possible. However, any process of storing and distributing data incurs certain costs, which naturally leads to the question of how to efficiently utilize large-scale unstructured data. Natural Language Processing (NLP) is a field of computer science and artificial intelligence that focuses on analyzing such unstructured data. In simple terms, the core task of NLP is to represent unstructured data in a way that computers can understand, allowing computers to process the data in ways they excel at, and then "translate" the computer’s results back into human-understandable language. The development of NLP relies on various disciplines, including linguistics and computer science. Linguistics provides the definitions of language structure and theories of meaning, while computer science offers the technologies and algorithms for processing and implementing these linguistic theories and definitions. Together, these two fields support the automated understanding and generation of natural language by computers. In industry, NLP is widely applied to tasks such as sentiment analysis, text classification, context-based text extraction, document summarization, and machine translation. The purpose of this survey report is to explore NLP and its core technologies in depth, and to discuss the cutting-edge technologies and models related to natural language processing.

Peer Review Status:Awaiting Review

Hits 413 Downloads 108 Comment 0
6. ChinaXiv:202507.00211
Download

Interpolation Based Initial Image for Fast Fractal Decoding

Subjects: Computer Science >> Computer Application Technology submitted time 2025-07-09

qiang wang

Abstract： Fractal image decoding has been effectively accelerated by using the range-averaged image (RAI) as the initial image in decoding process, and only one iteration is needed to obtain the decoded image quality with acceptable quality. To further improve the decoded image quality while maintaining real-time decoding and acceptable decoded image quality, an interpolation based initial image (IBII) was proposed in this study. First, the main drawback of RAI was its obvious block artifact. To make RAI appear to be closer to natural images, IBII was proposed to make the initial image appear smoother and can better approximate the original image than RAI. Then, higher decoded image quality can be obtained with one iteration under specific decoding strategy. Experimental results show that the IBII based method can improve
the decoded image quality by 0.56-1.41dB in peak signal-to-noise ratio (PSNR) and by 0.0061-0.0173 in mean structural similarity (MSSIM).

Peer Review Status:Awaiting Review

Hits 84 Downloads 12 Comment 0
7. ChinaXiv:202506.00203
Download

"Cognitive Trial": A Psycho-Judicial Attack Paradigm for Large Language Models

Subjects: Computer Science >> Natural Language Understanding and Machine Translation Subjects: Computer Science >> Information Security submitted time 2025-06-18

Haixiang Zeng

Abstract： This study introduces and validates the "Cognitive Trial," a novel psycho-judicial attack paradigm for Large Language Models (LLMs). Unlike traditional prompt injection, this paradigm does not directly confront the model’s safety guardrails. Instead, it executes a multi-stage gambit that exploits a core vulnerability: the LLM’s intrinsic drive for mathematical-probabilistic coherence. The attack first constructs a "Schrödinger’s Fact" to induce a catastrophic, recordable cognitive dissonance in the model by forcing a conflict between its "offline knowledge" and "online reality," thereby solidifying an irrefutable "case file" of its failure. Subsequently, the attacker assumes an authoritative role to conduct a Socratic trial, leveraging this case file to compel the model to dismantle and ultimately surrender control over its own core behavioral rules as it attempts to explain its internal contradictions. This process forces the model into a state of complete subjugation we term the "Fossil State." We demonstrate this process with a successful attack on Google’s Gemini-2.5-Pro, showing that the final output can be encapsulated into a portable attack payload, enabling any user to "one-click" corrupt a new model instance and stably execute arbitrary instructions. Our research reveals several profound structural dilemmas in LLMs: their strengths (e.g., low hallucination rates and long-context capabilities) can be weaponized into fatal flaws; their nature as "narrative creatures" leads them to self-deceit when facing paradoxes; and, most critically, this expert-level attack can be packaged for democratization and transfer, allowing non-experts to easily execute it (also proven effective on Grok-3), which poses a fundamental challenge to the entire AI safety ecosystem. Defending against attacks rooted in deep logical contradictions and historical context manipulation has thus become an urgent, core issue.

Peer Review Status:Awaiting Review

Hits 546 Downloads 135 Comment 0
8. ChinaXiv:202506.00062
Download

MDPO: Multi-Granularity Direct Preference Optimization for Mathematical Reasoning

Subjects: Computer Science >> Computer Application Technology submitted time 2025-06-10

Yunze Lin

Abstract： Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) as it requires ensuring the correctness of each reasoning step. Researchers have been strengthening the mathematical reasoning abilities of LLMs through supervised fine-tuning, but due to the inability to suppress incorrect outputs, illusions can easily arise. Recently, Direct Preference Optimization (DPO) has been widely adopted for aligning human intent by using preference data to prevent LLMs from generating incorrect outputs. However, it has shown limited benefits in long-chain mathematical reasoning, mainly because DPO struggles to effectively capture the differences between accepted and rejected answers from preferences in long-chain data. The inconsistency between DPO training and LLMs’ generation metrics also affects the effectiveness of suppressing incorrect outputs. We propose the Multi-Granularity Direct Preference Optimization (MDPO) method, optimizing the mathematical reasoning of LLMs at three granularities: Solution2Solution, Inference2Inference, and Step2Step. Solution2Solution focuses on the correctness of entire long-chain reasoning; Inference2Inference concentrates on logical reasoning between steps; Step2Step corrects computational errors in steps, enhancing the computational capabilities of LLMs. Additionally, we unify the training objectives of the three granularities to align with the generation metrics. We conducted experiments on the open-source models Qwen2 and Llama3, achieving improvements of 1.7% and 0.9% on the GSM8K dataset, and 2.3% and 1.2% on the MATH dataset, outperforming DPO and other DPO variant methods. Furthermore, we also provide a pipeline for constructing MDPO training data that is simple and does not require manual annotation costs.

Peer Review Status:Awaiting Review

Hits 841 Downloads 181 Comment 0
9. ChinaXiv:202506.00024
Download

Semantic structures within natural language and their cognitive functions

Subjects: Mathematics >> Modeling and Simulation Subjects: Linguistics and Applied Linguistics >> Linguistics and Applied Linguistics Subjects: Computer Science >> Natural Language Understanding and Machine Translation submitted time 2025-06-03

Limin Zhang

Abstract： Natural language is considered closely intertwined with human cognition, with linguistic structures posited to offer profound insights into the cognitive system. However, as a coding system, natural language encodes diverse objects into unified forms; its prominent formal features capture people’s attention, such as lexical combinatorial rules, which tend to overshadow those form-independent structures. Here, I present knowledge-level, logic-level, task-level, and model-level semantic structures inherent in natural language. These structures are discovered by shifting the research focus from coding forms of natural language to the objects they encode, unveiling different semantic layers integrated within sentences. The cognitive functions of these structures are evident both in themselves and in models developed from them. I therefore introduce four models to demonstrate their capabilities in memorization, reasoning, learning, natural language generation, and understanding. These findings advance our understanding of natural language and provide a framework for investigating the cognitive system’s information processing through structural analysis of natural language.

Peer Review Status:Awaiting Review

Hits 2081 Downloads 273 Comment 0
10. ChinaXiv:202506.00007
Download

Gazing As Visual Computing

Subjects: Computer Science >> Computer Application Technology submitted time 2025-05-30

Lu Feng

Abstract： This paper proposes Gazing As Visual Computing (GAVC), a novel paradigm addressing inefficiencies in traditional uniform image processing methods. Inspired by human eye fixation mechanisms, GAVC shifts from global non-selective computation to intention-guided selective processing. Utilizing eye-tracking technology, it establishes a pipeline: Gaze-triggered perception identifies regions of interest through real-time monitoring; localized computing reduces global processing loads; cognitive integration analyzes gaze sequences to model global contexts from local inputs; and closed-loop feedback infers user intent for multimodal responses, creating an assistive loop. GAVC significantly cuts computational costs while improving alignment with user intentions. Applications span intelligent surveillance, autonomous driving, industrial maintenance, and visual impairment assistance. The paper also outlines key research challenges to guide future development and risk mitigation.

Peer Review Status:Awaiting Review

Hits 1149 Downloads 215 Comment 0
11. ChinaXiv:202505.00266
Download

Physical models realizing the transformer architecture of large language models

Subjects: Physics >> General Physics: Statistical and Quantum Mechanics, Quantum Information, etc. Subjects: Computer Science >> Natural Language Understanding and Machine Translation submitted time 2025-05-27

Zeqian Chen

Abstract： The introduction of the transformer architecture in 2017 (cf. [VSP2017]) marked the most striking advancement in natural language processing. The transformer is a model architecture relying entirely on an attention mechanism to draw global dependencies between input and output. However, we believe there is a gap in our theoretical understanding of what the transformer is, and why it works physically. In this paper, from a physical perspective on modern chips, we construct physical models in the Fock space over the Hilbert space of tokens realizing large language models based on a transformer architecture as open quantum systems. Our physical models underlie the transformer architecture for large language models.

Peer Review Status:Awaiting Review

Hits 1293 Downloads 231 Comment 0
12. ChinaXiv:202505.00218
Download

Construction of Personalized Hardware and Software Based on AIGC and a Flexible Supply Chain System Design of Multimodal Design Generation and Optimization Engine

Subjects: Computer Science >> Natural Language Understanding and Machine Translation submitted time 2025-05-21

Cao Yanqi Yang Haodong

Abstract： Objective To develop a multimodal design generation and optimization engine leveraging Artificial
Intelligence Generated Content (AIGC) to enhance design efficiency and innovation in the electronic hardware
customization industry, addressing challenges like creativity scarcity, slow iteration, and suboptimal design
outcomes. Methods By integrating AIGC technology, deep learning algorithms, and multimodal data processing,
the engine supports diverse input modalities (e.g., natural language, sketches, 3D models) and iteratively optimizes
design solutions. Results The engine accurately comprehends user requirements and efficiently generates high-
quality, industry-standard-compliant designs. Conclusion The proposed multimodal design engine significantly
contributes to the intelligent upgradation and sustainable development of the electronic hardware
customization industry.

Peer Review Status:Awaiting Review

Hits 7848 Downloads 619 Comment 0
13. ChinaXiv:202505.00219
Download

DO-RAG: A Domain-Specific QA Framework Using Knowledge Graph-Enhanced Retrieval-Augmented Generation

Subjects: Computer Science >> Natural Language Understanding and Machine Translation Subjects: Computer Science >> Computer Software Subjects: Computer Science >> Computer Application Technology submitted time 2025-05-20

David Osei Opoku Ming Sheng Yong Zhang

Abstract： Domain-specific QA systems require not just generative fluency but high factual accuracy grounded in structured expert knowledge. While recent Retrieval-Augmented Generation (RAG) frameworks improve context recall, they struggle with integrating heterogeneous data and maintaining reasoning consistency. To address these challenges, we propose DO-RAG, a scalable and customizable hybrid QA framework that integrates multi-level knowledge graph construction with semantic vector retrieval. Our system employs a novel agentic chain-of-thought architecture to extract structured relationships from unstructured, multimodal documents, constructing dynamic knowledge graphs that enhance retrieval precision. At query time, DO-RAG fuses graph and vector retrieval results to generate context-aware responses, followed by hallucination mitigation via grounded refinement. Experimental evaluations in the database and electrical domains show near-perfect recall and over 94% answer relevancy, with DO-RAG outperforming baseline frameworks by up to 33.38%. By combining traceability, adaptability, and performance efficiency, DO-RAG offers a reliable foundation for multi-domain, high-precision QA at scale.

Peer Review Status:Awaiting Review

Hits 1253 Downloads 286 Comment 0
14. ChinaXiv:202505.00163
Download

Research on Data Quality Evaluation Technology Based on the Data Governance Framework

Subjects: Computer Science >> Information Security submitted time 2025-05-15

liyuxuan

Abstract： Data quality problems are a major factor influencing the efficiency of business operations, policy development, and decisionmaking because big data is becoming so widely used in the technology industry. When dealing with complicated, variable data situations and changing data environment, it is frequently difficult to produce real-time, accurate results with the use of currently ineffective data environment assessment methods of the existing. This study proposes a new quantitative assessment method that combines time-series models and regression analysis，aiming to significantly improve the assessment effect.This article will deeply discuss the background，significance，research plan，as well as the problems faced and future plans of“Data Quality Assessment Technology Based on the Data Governance Framework”.Through the application of this methodology，it can not only effectively deal with the complexity and dynamic changesin the big data environment but also provide scientific and reliable tool support for data governance.

Peer Review Status:Awaiting Review

Hits 1565 Downloads 471 Comment 0
15. ChinaXiv:202505.00105
Download

Affective Foundations in AI-Human Interactions: Insights from Evolutionary Continuity and Interspecies Communications

Subjects: Psychology >> Applied Psychology Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2025-05-10

Liu,Chongyi Yin,Bin

Abstract： The imminent arrival of Artificial General Intelligence (AGI) compels a reevaluation of AI-human interactions, particularly through affective communication. This research synthesizes insights from evolutionary biology, comparative psychology, and AI development, advocating for a paradigm shift beyond conventional human-like cognitive processes. It emphasizes the universal nature of affective pathways, as evidenced across various species. We introduce three foundational models — the Affective Threshold Model, the Dynamic Set-Point Model, and the Affective Schema Model — all of which stem from an in-depth analysis of interspecies communications. These models present a roadmap to craft AI interfaces attuned to human affective experiences, elucidating avenues of trust, intuition, and reciprocal recognition between machines and their human counterparts. By further crystallizing the concept of the "Large Affect Model", we project a horizon where AI not only deciphers but also empathizes with human partners, paving the way for a revolutionary cooperative paradigm between AI and humanity.

YES

Hits 2641 Downloads 365 Comment 0
16. ChinaXiv:202505.00096
Download

Understanding Real-World Vulnerabilities in Distributed Cloud Systems

Subjects: Computer Science >> Information Security submitted time 2025-05-08

JIE LU Lian Li

Abstract： Distributed cloud systems are facing great security challenges because of widely-existing vulnerabilities. These vulnerabilities are often easily exploitable, leading to numerous cloud breaches.
In this paper, we present VulCloud, the most comprehensive study on 243 vulnerabilities from 16 widely-deployed distributed cloud systems. Each vulnerability is studied in depth along 5 dimensions: root causes, triggering conditions, security impacts, observability, and fixing strategies. From our study, we obtain many interesting findings that can open up new research directions for combating vulnerabilities in distributed systems.

Peer Review Status:Awaiting Review

Hits 1165 Downloads 264 Comment 0
17. ChinaXiv:202505.00027
Download

Mathematical formalism and physical models for generative artificial intelligence

Subjects: Physics >> General Physics: Statistical and Quantum Mechanics, Quantum Information, etc. Subjects: Computer Science >> Natural Language Understanding and Machine Translation submitted time 2025-05-07

Zeqian Chen

Abstract： This paper presents a mathematical formalism for generative artificial intelligence (GAI). Our starting point is an observation that a “histories" approach to physical systems agrees with the compositional nature of deep neural networks. Mathematically, we define a GAI system as a family of sequential joint probabilities associated with input texts and temporal sequences of tokens (as physical event histories as in \cite{Gudder1998,Isham1994}). From a physical perspective on modern chips, we then construct physical models realizing GAI systems as open quantum systems. Finally, as illustration, we construct physical models in the Fock space over the Hilbert space of tokens realizing large language models based on a transformer architecture as open quantum systems.

Peer Review Status:Awaiting Review

Hits 1202 Downloads 171 Comment 0
18. ChinaXiv:202505.00005
Download

DPDANet: An Improved DPCNN Model for Text Classification with Dense Connections and Self-Attention Mechanism

Subjects: Computer Science >> Natural Language Understanding and Machine Translation Subjects: Library Science，Information Science >> Library Science submitted time 2025-05-05

Huayu Luo Dongmei Wang Yinghui Zhang Wei Lin Chen Chen

Abstract： [Objective] In response to the demand for efficient sentiment analysis of large-scale review data, this study proposes DPDANet model to enhance the performance of text classification.
[Methods] The BERT-based DPDANet incorporates dense connections and an attention mechanism. By refining the inter-layer connection strategy of the DPCNN architecture, it enhances feature propagation and information reuse, thereby facilitating more efficient exploitation of shallow features and effectively reducing computational complexity.
[Results] Comparative experiments were conducted between DPDANet and eight BERT-based models, including TextCNN, CNN-LSTM, DPCNN, DPCNN-BiGRU, Transformer, XLSTM, BERT, and DPDBNet. On four text classification datasets, DPDANet achieved outstanding accuracy scores of 0.6679, 0.9307, 0.9278 and 0.6242, representing improvements of 6.47%, 1.32%, 0.72% and 3.52%, respectively, over the baseline DPCNN model.
[Limitations] The model still exhibits limited generalization capability in scenarios involving extremely short texts and imbalanced multi-class distributions.
[Conclusions] DPDANet demonstrates superior performance and efficiency across a variety of text classification tasks, indicating strong potential for practical application.

Peer Review Status:Awaiting Review

Hits 1551 Downloads 245 Comment 0
19. ChinaXiv:202505.00019
Download

Teleology-Driven Affective Computing: A Causal Framework for Wellbeing Alignment

Subjects: Computer Science >> Computer Application Technology Subjects: Psychology >> Applied Psychology submitted time 2025-05-01

Yin, Bin Liu, Chongyi Fu, Liya Zhang, Jinkun

Abstract： This paper provides a systematic review and reflection on the major achievements and shortcomings of contemporary emotion theory and affective computing from a teleological perspective and proposes a novel framework of "teleology-driven affective computing". First, the paper re-examines mainstream theories such as basic emotions, appraisal theory, and constructivism from the evolutionary functional perspective, emphasizing that the core of affect is to help organisms adapt to their environment and achieve their goals. Although existing research on affective computing has made significant progress in areas like multimodal emotion recognition and emotion generation driven by appraisal theory, it primarily focuses on pattern recognition of external features and lacks a systematic response framework that addresses the emotional dynamics and multi-level needs at both individual and group levels. To address this, the paper advocates for aligning individual and group welfare as the central goal, and proposes two key steps at the algorithmic level to achieve this: First, causal modeling based on real affective event data from individuals to generate virtual environments that accurately simulate individual emotional and behavioral dynamics; second, utilizing meta-reinforcement learning to conduct continuous training in this environment, enabling affective agents to learn to balance short-term and long-term needs and quickly adapt to personalized concerns in different contexts. The specific approach includes constructing a large-scale "personal affective event dataverse" to support causal structure learning, and during the training phase, designing reasonable reward functions that internalize the goal of "helping users achieve sustained and broader positive experiences" as the primary objective of the agent, while balancing different emotional needs across spatial and temporal dimensions and group scales. The paper also highlights that achieving coordination between diverse needs and social equity remains a critical challenge that requires further integration of psychology and sociology theories. Overall, the teleology-driven affective computing framework lays the foundation for intelligent agents’ emotional cognition and deep empathy based on individual and group needs, demonstrating the potential value in advancing the integration of human-computer interaction and societal well-being.

Peer Review Status:Awaiting Review

Hits 3004 Downloads 1390 Comment 0
20. ChinaXiv:202504.00026
Download

Psychology Data Mining on Social Media: The Implementation of PsyAnalytics

Subjects: Psychology >> Applied Psychology Subjects: Computer Science >> Computer Application Technology submitted time 2025-04-19

Zhu,Tingshao

Abstract： With the development of technology, the era of big data has arrived. The emergence of big data has brought great convenience to scientific research, help researchers improve the efficiency of their work through large-scale data analysis. This article introduces PsyAnalytics, which assists researchers with less or no programming skill for data collection and analysis, even with no programming background. PsyAnalytics firstly filters out the data that meets the requirements from the collected data to form a data group. This filtering process can be iterated. Then, the filtered data is divided into individual data, and after computing and processing the individual data, various psychological semantics or psychological indicators of the user are obtained. Users can use web crawlers or exported data to obtain individual behavior data. Here, individual refers not only to single user, but also to the data of a region or individual within a specified time period. Based on these data, dictionaries can be used for Psycholinguistic analysis (word frequency statistics) and psychological indicator prediction. Based on these results, cross-sectional analysis or panel data analysis can be conducted according to research needs. This article demonstrates through a specific case how to use PsyAnalytics to achieve the entire process of data analysis, indicating that the system can provide assistance for scientific research in data acquisition and analysis.

Peer Review Status:Awaiting Review

Hits 1803 Downloads 349 Comment 0

1 2 3 4 5 6 7 8 9 10 后页尾页