Abstract:
[Objective] To systematically review the methodological framework and application progress of large language model-driven scientific hypothesis generation, and to reveal the current research landscape and development trends in this field. [Coverage] Using keywords such as "Large Language Models" and "Scientific Hypothesis Generation", we conducted searches in databases including WOS, Google Scholar, and CNKI. Representative literature from 2021 to 2026 was screened, resulting in a final set of 98 papers for analysis. [Methods] An analytical framework was established along three dimensions: generation process logic, evolution of technical pathways, and key issues. Existing approaches at each stage—knowledge acquisition, preliminary hypothesis generation, iterative refinement, and evaluation and validation—were systematically reviewed. The underlying technical architectures were comparatively analyzed, core difficulties and current solutions were examined in depth, and relevant benchmark datasets and representative applications were summarized. [Results] The capabilities of LLMs in knowledge integration and association discovery offer a new paradigm for scientific hypothesis generation, having already yielded experimentally verified hypotheses in real-world scenarios across multiple domains. Current research exhibits a synergistic trend among five technical pathways: context engineering, supervised fine-tuning, reinforcement learning, planning and search, and multi-agent collaboration. A preliminary methodology has been formed for the core generation process; however, challenges remain in knowledge clue discovery, innovative hypothesis reasoning, and credibility, with model hallucination and intrinsic reasoning capabilities being the primary bottlenecks. [Limitations] As this emerging interdisciplinary field evolves rapidly, some of the most recent works may not be fully covered. This study focuses on methodological framework review and does not provide a systematic quantitative performance comparison of current methods. [Conclusions] Large language models have demonstrated the capability to assist in or even autonomously generate scientifically valuable hypotheses, enabling scalable and cross-disciplinary hypothesis exploration. Future research should seek breakthroughs in balancing reliability with novelty, enhancing deep reasoning capabilities, innovating human-AI collaborative paradigms, and establishing closed-loop integration between hypothesis generation and experimental verification.