摘要详情

ID / 提交时间

151 / 2025-04-10 16:10:20

标题

GeoGPT Transforming Paleontology with AI-Powered Data Extraction and Analysis

关键字

GeoGPT,Geobiological Knowledge Extraction,Multimodal Artificial Intelligence

主题及专题

Theme 1: Novel Tools and Techniques in Geobiology > Session 1a: Geobiological Big Data and Models

状态

摘要录用

作者

ye yufei / ZheJiang Lab

James Ogg / Purdue University；Chengdu University of Technology

Juye Wei / Zhejiang Lab

Zongyuan Xiang / Zhejiang Lab

Zhong Peng / zhejiang lab

Shuang Li / Zhejiang lab

Shao Qi Yu / Zhejiang lab

Jiang Yang / Zhejiang lab

摘要

The exponential growth of geobiological literature presents unprecedented challenges in data extraction efficiency, particularly when dealing with century-spanning paleontological archives and complex stratigraphic records. Traditional methods struggle with three core issues:

Weak semantic associations in long-form geological texts spanning multiple research paradigms;

Structural complexity of cross-page scientific tables with nested hierarchies;

Incompatible data formats across historical publications.

To address these challenges, we present GeoGPT – a non-profit domain-specific multimodal AI system engineered for mining geobiological knowledge. GeoGPT integrates groundbreaking technologic frameworks:

Multimodal Architecture for Scientific Document Analysis. Our hybrid intelligence system bridges macro-scale semantic comprehension with micro-scale pattern detection creating an integrated pipeline for parsing text narratives, tabular hierarchies and schematic diagrams in geoscience literature. This Multimodal architecture specifically addresses the critical challenge of digitizing legacy data trapped in historical monographs and technical reports — automatically extracting fragmented paleontological observations from multi-format documents and transforming them into structured digital records. The structured outputs directly support large-scale evolutionary analyses by providing computationally tractable representations of taxonomic relationships, stratigraphic distributions and morphological characteristics preserved in century-old scientific archives.

Data Extraction Pipeline. Our cognitive-driven workflow transcends conventional end-to-end extraction paradigms through intent-aware computational design. By implementing demand decomposition via semantic requirement parsing, the system dynamically disambiguates extraction objectives and allocates subtasks across hybrid processing modules. This architecture synergizes GeoGPT's domain-specific knowledge retrieval with computer vision-driven diagram analysis, employing prompt-chaining mechanisms to maintain contextual coherence across multi-page document landscapes. Crucially, the pipeline incorporates multistage verification loops where extracted entities undergo automated reconciliation with source visual elements through graph-based backtracking algorithms. This paradigm shift achieves three fundamental advancements: 1) Significant mitigation of LLM hallucination through constraint-satisfaction processing; 2) Full traceability of data provenance via task-specific lineage tracking; and 3) Scalable adaptability from specimen-level feature extraction to ecosystem-scale pattern mining — capabilities unattainable through monolithic model approaches.

Benchmark Construction and Validation - Our interdisciplinary team has developed a tiered annotation framework combining AI-assisted pre-annotation with expert-led verification. The workflow begins with domain specialists from paleontology, paleomagnetism and petroleum geology defining entity taxonomies and stratigraphic relationship schemas. Trained annotators then perform initial labeling using our custom platform, which integrates active learning strategies to prioritize ambiguous cases for expert review. Current benchmarks encompass hundreds of peer-reviewed papers and technical reports, yielding 4,347 annotated instances across three disciplines: fossil occurrence records (31%), paleomagnetic polarity sequences (23%), hydrocarbon reservoir characteristics (49%). Each data point undergoes dual validation through cross-referencing with original visual elements and reconciliation with domain knowledge bases.

At the time of submission of this abstract, our ongoing development focuses on two strategic priorities:

Specialized Model Training - Optimizing domain-specific extraction architectures to handle complex stratigraphic diagrams while maintaining computational efficiency.

Cross-Domain Dataset Construction - Curating benchmark datasets spanning paleoclimate proxies, geochemical analyses and planetary surface features to enable systematic validation.

AI-Reasoning Optimization– Developing domain-specific large language models with automated reasoning mechanisms that synergize contextual logic parsing and dynamic knowledge graph integration, significantly enhancing accuracy in deciphering ambiguous stratigraphic correlations and cross-modal geological patterns.

These parallel initiatives are establishing new paradigms for AI-assisted knowledge extraction in Earth sciences. Initial applications demonstrate robust performance in processing material science literature and astrogeological reports, confirming the framework's adaptability across geoscience subdisciplines.

重要日期

会议日期

06月10日

2025

至

06月13日

2025
04月15日 2025

初稿截稿日期

主办单位

National Natural Science Foundation of China
Geobiology Society
National Committee of Stratigraphy of China
Ministry of Science and Technology
Geological Society of China
Paleontological Society of China
Nanjing Institute of Geology and Palaeontology, Chinese Academy of Sciences (CAS)
Institute of Vertebrate Paleontology and Paleoanthropology, CAS
International Commission on Stratigraphy
International Paleontological Association

承办单位

State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences (CUG, Wuhan)

联系方式

Yizhou Huang
yi******@cug.edu.cn
186********

登录查看完整联系方式

历届会议

2025年06月10日中国 Wuhan
第五届国际地球生物学会议
2017年06月24日中国 Wuhan,China
The 4th International Conference of Geobiology

移动端

在手机上打开

小程序

打开微信小程序

客服

扫码或点此咨询

The 5th International Conference of Geobiology