Title: PREGO text mining and data integration to elucidate ecosystem functioning : associating organisms and environments with biogeochemical and anthropogenic impact processes

Funding Source: HFRI (ΕΛΙΔΕΚ)

Budget IMBBC: 155,000€

Start / End Date: 2018 - 2021

Web site URL:

Project Progress: 100%

Principal Investigator:

Evangelos Pafilis

Project Members:

Research Directions:

Environmental genomics

Marine biodiversity

Bioinformatics and biodiversity informatics


Process, environment, organism (PREGO) is a systems-biology approach to elucidate ecosystem function at the microbial dimension. Large-scale text-mining, data-mining, and network analysis are combined to this end.

To understand key functions of ecosystems it is fundamental to study what biogeochemical processes, occur in which environments (where), and which organisms carry them out (who).

Microbiology, molecular ecology and biodiversity address the above. Phylogenetic marker gene analyses aim at deciphering the community composition of environmental samples. Sequence analysis pipelines assemble, cluster, and characterize environmental DNA, RNA, and protein sequences to infer community composition and to assign functions. Standards-compliant, expert-assigned, metadata annotations (like isolation source) provide valuable input too. Importantly, pieces of information missing from an experiment’s data record metadata, or stored in fragmented computational analysis results, may be described in the accompanying literature. Thus, although valuable researcher input exists, it may just lie buried in free-text.

What-where-who associations, not observable previously, could become apparent once hidden evidence and fragmented data are all brought together. Thus, added value could be gained by combining the output of a range of existing computational analysis tools with expert-curated evidence, and automatically extracted facts of interest hidden in the vast body of biology literature. This is the motivation of PREGO, a one-stop-shop for researchers interested in searching and visually exploring such what-where-who associations.