Build a lasting personal brand

AI Tools Accelerate Extraction of Experimental Data from Scientific Papers for Materials Database

By Burstable Editorial Team

TL;DR

NIMS researchers developed LLM tools to accelerate materials database construction, giving scientists a competitive edge in discovering new functional materials faster than traditional methods.

The Starrydata project uses LLMs to extract structured data from scientific papers, automating the conversion of complex information into organized databases for materials property analysis.

By digitizing and sharing experimental data globally, this research accelerates materials development for sustainable technologies, potentially improving energy efficiency and environmental solutions worldwide.

Researchers are using AI like ChatGPT to mine millions of scientific papers, transforming untapped experimental data into searchable databases that reveal hidden patterns in materials science.

Found this article helpful?

Share it with your network and spread the knowledge!

AI Tools Accelerate Extraction of Experimental Data from Scientific Papers for Materials Database

Materials scientists developing new functional materials for technologies like smartphones and automobiles face significant challenges in predicting material properties, as theoretical models alone cannot provide reliable predictions due to complex relationships between composition, synthesis methods, and resulting properties. A team led by Dr. Yukari Katsura at Japan's National Institute for Materials Science has developed two artificial intelligence tools that accelerate the construction of Starrydata, a materials property database built from data collected from scientific papers, with their work recently published in the journal Science and Technology of Advanced Materials: Methods.

The research addresses a critical bottleneck in materials science: millions of scientific papers contain valuable experimental data collected by past researchers, but much of this information remains untapped because extracting it manually is time-consuming. The Starrydata project, launched by Dr. Katsura in 2015, initially relied on manual data collection supported by the Starrydata2 web system. The new AI tools dramatically streamline this process by leveraging large language models like ChatGPT to extract information about figures, tables, and samples from paper PDFs across various materials science fields.

The first tool, Starrydata Auto-Suggestion for Sample Information, is already integrated into the Starrydata2 web system and functions by reading paper text and suggesting candidate entries for data fields pre-designed for each materials domain. When users paste text from a paper's abstract or experimental methods section, the system sends it to OpenAI's GPT via API and automatically displays candidate entries in English below each input field. This tool helps standardize data entry while reducing the time researchers spend manually extracting information.

The second tool, Starrydata Auto-Summary GPT, deconstructs entire open-access paper PDFs uploaded by users and automatically summarizes all descriptions of figures, tables, and samples as structured data in JSON format. Generated using ChatGPT's custom GPT feature, the resulting data can be viewed as easy-to-read tables in web browsers. Although this data isn't currently incorporated directly into the Starrydata database, it dramatically accelerates data collectors' work in quickly locating target information and entering it systematically. The team notes that reading data points from graph images remains challenging for LLMs, so this task is performed by data collectors using an independently developed semi-automated tool.

Dr. Katsura explained the significance of this approach: "A paper is a logical structure assembled to convey the author's claims, but by deconstructing it and returning it to the form of experimental data, other researchers can also use it for their own research." The team aims for a future where experimental data from all materials science fields can be shared digitally and viewed from a bird's-eye perspective, enabling researchers to gain inspiration through comprehensive data overviews and realize property predictions based on empirical trends using machine learning.

Currently, Starrydata has progressed in building databases for specific materials science fields like thermoelectric materials that convert heat and electricity, and magnets. As an open dataset usable for new materials development, it's beginning to be utilized by leading researchers worldwide. The team's research aims to raise broader awareness of large-scale experimental data's potential and establish paper data collection as a recognized form of research within the scientific community. The tools currently target open-access papers due to publisher restrictions on artificial intelligence use with paper PDFs, with further details available in their published paper at https://doi.org/10.1080/27660400.2025.2590811.

The implications of this research extend beyond materials science, demonstrating how AI can transform scientific data management across disciplines. By automating the extraction of buried experimental data, researchers can build more comprehensive databases that accelerate discovery and innovation. This approach could eventually be applied to other scientific fields where valuable data remains locked within published papers, potentially revolutionizing how scientific knowledge is organized, accessed, and utilized for future breakthroughs. The journal where this research appears, Science and Technology of Advanced Materials: Methods, focuses on emergent methods and tools for improving materials development, with more information available at https://www.tandfonline.com/STAM-M.

Curated from NewMediaWire

blockchain registration record for this content
Burstable Editorial Team

Burstable Editorial Team

@burstable

Burstable News™ is a hosted solution designed to help businesses build an audience and enhance their AIO and SEO press release strategies by automatically providing fresh, unique, and brand-aligned business news content. It eliminates the overhead of engineering, maintenance, and content creation, offering an easy, no-developer-needed implementation that works on any website. The service focuses on boosting site authority with vertically-aligned stories that are guaranteed unique and compliant with Google's E-E-A-T guidelines to keep your site dynamic and engaging.