15 / 2025-03-23 15:16:12
Online Database Suite Implementation of the Treatise of Invertebrate Paleontology
Database, AI, Paleontology, Phanerozoic
摘要待审
James Ogg / Chengdu Univ. Technology
Aditya Sivathanu / Purdue University
Kevin Chang / Purdue University
Aaron Ault / Purdue University
Samyukta Balaji / Purdue University
Jieping Ye / Zhejiang Lab
Hongyang Chen / Zhejiang Lab
Yufei Ye / ZheJiang Lab
Zongyuan Xiang / Zhejiang Lab
             The Treatise on Invertebrate Paleontology is a definitive multi-authored work of some 50 volumes, written and continuously enhanced by more than 300 paleontologists over the course of seven decades. It covers all of the major groups of fossil and extant (still living) invertebrate animals. Each volume is approximately 500 to 1000 pages, and typically has details on about 1000 to 4000 genera within each phylum. For example, in the six brachiopod volumes published in the 2000’s there are ca. 4000 brachiopod genera. For each genus, information provided includes taxonomy, morphology, and stratigraphic and paleogeographic range. Notably, the Treatise was a key source for the groundbreaking paleodiversity curves of marine genera by J. John Sepkoski, Jr. (1996, 2002).

            Until recently, the Treatise was only available as printed volumes or for-purchase PDFs.  Now, nearly the entire series is available as PDF for free download from the Paleontological Institute, University of Kansas (https://journals.ku.edu/InvertebratePaleo/issue/archive; current editor is Bruce S. Lieberman). However, the data inherent in these PDFs is not available in a format that makes it easy to use by scientists. Therefore, this work is timely. To that end a joint collaboration was begun between the Treatise staff and a team of undergraduate computer-engineers at Purdue University, Indiana, to migrate the volumes into a searchable online database (temporarily hosted as https://treatise.geolex.org, to be later migrated to the Univ. Kansas). The separate phylum-based sub-sites are searchable by genus name, geologic time, numerical date or date-range. Each genus entry contains separate fields for taxonomic hierarchy, type species, synonyms, descriptions (eventually with images), beginning and ending of its range in geologic and numerical time, and location.

            While there is a tremendous amount of information contained within Treatise PDFs, individual Treatise volumes have followed a fairly consistent format over many decades, which will help in extracting information to populate databases. A Python routine and a workflow of AI tools (developed by a team involved in the GeoGPT project) were applied to recognize the paragraphs with information on each genus, and a rule-based parsing of the details in that paragraph into an Excel spreadsheet. The computer-generated Excel spreadsheets are checked and enhanced before uploading into the online database system.

            The ages within the Treatise are given in either regional or international stages/substages.  These were converted to a numerical age range, using a conversion table of approximately 1000 entries that was prepared for the age range of each type of chronostratigraphic unit.

            In turn, the online databases enable: (1) automatic generation of diversity curves by phylum, class or order; (2) one-click transfer of diversity curves into the TimeScale Creator visualization system for display against biozones, climate or other trends in Earth history; and (3) automated linking of names of genera cited in other datasets, such as fossils within a geologic formation or from stratigraphic columns of the OneStratigraphy database, to the appropriate webpage of their Treatise entry and image.

            At the time of submission of this abstract (March, 2025), about 20% of the Treatise volumes have been extracted in different degrees of completeness into the public online Treatise database suite. Future challenges and goals include: (1) using computational approaches to convert text for geographic locations (e.g., “northern Poland”) of genera into GeoJSON polygons to enable users to visualize their former biogeographic position on plate reconstructions for that age; (2) extracting basic information from the Treatise volumes for the individual orders and classes, which lack the structure of the entries for each genus; (3) utilizing the Treatise information with the help of GeoGPT to construct taxonomic keys for classifying fossils; (4) interlinking to other external websites that contain non-technical summaries and images; and (5) expanding the suite of databases having a common search and visualization system to also include vertebrates and flora (including dinoflagellates and spore-pollen).



            We thank Bruce Lieberman and Natalia Lopez-Carranza of the Paleontological Institute of Univ. Kansas, the Deep-time Digital Earth big-science program of IUGS, the School of Electrical and Computer Engineering’s Vertically Integrated Projects of Purdue University, the Key Laboratory of Deep-time Geography and Environment Reconstruction at Chengdu University of Technology, and the Zhejiang Laboratory.

 
重要日期
  • 会议日期

    06月10日

    2025

    06月13日

    2025

  • 04月15日 2025

    初稿截稿日期

主办单位
National Natural Science Foundation of China
Geobiology Society
National Committee of Stratigraphy of China
Ministry of Science and Technology
Geological Society of China
Paleontological Society of China
Nanjing Institute of Geology and Palaeontology, Chinese Academy of Sciences (CAS)
Institute of Vertebrate Paleontology and Paleoanthropology, CAS
International Commission on Stratigraphy
International Paleontological Association
承办单位
State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences (CUG, Wuhan)
联系方式
历届会议
移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询