2022 |
Paragkamian, Savvas; Sarafidou, Georgia; Mavraki, Dimitra; Pavloudi, Christina; Beja, Joana; Eliezer, Menashè; Lipizer, Marina; Boicenco, Laura; Vandepitte, Leen; Perez-Perez, Ruben; Zafeiropoulos, Haris; Arvanitidis, Christos; Pafilis, Evangelos; Gerovasileiou, Vasilis Automating the Curation Process of Historical Literature on Marine Biodiversity Using Text Mining: The DECO Workflow Journal Article Frontiers in Marine Science, 9 , pp. 940844, 2022, ISSN: 2296-7745. @article{paragkamian_automating_2022, title = {Automating the Curation Process of Historical Literature on Marine Biodiversity Using Text Mining: The DECO Workflow}, author = {Savvas Paragkamian and Georgia Sarafidou and Dimitra Mavraki and Christina Pavloudi and Joana Beja and Menashè Eliezer and Marina Lipizer and Laura Boicenco and Leen Vandepitte and Ruben Perez-Perez and Haris Zafeiropoulos and Christos Arvanitidis and Evangelos Pafilis and Vasilis Gerovasileiou}, url = {https://imbbc.hcmr.gr/wp-content/uploads/2022/07/2022-Paragkaminan-fmars-53.pdf }, doi = {10.3389/fmars.2022.940844}, issn = {2296-7745}, year = {2022}, date = {2022-01-01}, urldate = {2022-07-29}, journal = {Frontiers in Marine Science}, volume = {9}, pages = {940844}, abstract = {Historical biodiversity documents comprise an important link to the long-term data life cycle and provide useful insights on several aspects of biodiversity research and management. However, because of their historical context, they present specific challenges, primarily time- and effort-consuming in data curation. The data rescue process requires a multidisciplinary effort involving four tasks: (a) Document digitisation (b) Transcription, which involves text recognition and correction, and (c) Information Extraction, which is performed using text mining tools and involves the entity identification, their normalisation and their co-mentions in text. Finally, the extracted data go through (d) Publication to a data repository in a standardised format. Each of these tasks requires a dedicated multistep methodology with standards and procedures. During the past 8 years, Information Extraction (IE) tools have undergone remarkable advances, which created a landscape of various tools with distinct capabilities specific to biodiversity data. These tools recognise entities in text such as taxon names, localities, phenotypic traits and thus automate, accelerate and facilitate the curation process. Furthermore, they assist the normalisation and mapping of entities to specific identifiers. This work focuses on the IE step (c) from the marine historical biodiversity data perspective. It orchestrates IE tools and provides the curators with a unified view of the methodology; as a result the documentation of the strengths, limitations and dependencies of several tools was drafted. Additionally, the classification of tools into Graphical User Interface (web and standalone) applications and Command Line Interface ones enables the data curators to select the most suitable tool for their needs, according to their specific features. In addition, the high volume of already digitised marine documents that await curation is amassed and a demonstration of the methodology, with a new scalable, extendable and containerised tool, “DECO” (bioDivErsity data Curation programming wOrkflow) is presented. DECO’s usage will provide a solid basis for future curation initiatives and an augmented degree of reliability towards high value data products that allow for the connection between the past and the present, in marine biodiversity research.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Historical biodiversity documents comprise an important link to the long-term data life cycle and provide useful insights on several aspects of biodiversity research and management. However, because of their historical context, they present specific challenges, primarily time- and effort-consuming in data curation. The data rescue process requires a multidisciplinary effort involving four tasks: (a) Document digitisation (b) Transcription, which involves text recognition and correction, and (c) Information Extraction, which is performed using text mining tools and involves the entity identification, their normalisation and their co-mentions in text. Finally, the extracted data go through (d) Publication to a data repository in a standardised format. Each of these tasks requires a dedicated multistep methodology with standards and procedures. During the past 8 years, Information Extraction (IE) tools have undergone remarkable advances, which created a landscape of various tools with distinct capabilities specific to biodiversity data. These tools recognise entities in text such as taxon names, localities, phenotypic traits and thus automate, accelerate and facilitate the curation process. Furthermore, they assist the normalisation and mapping of entities to specific identifiers. This work focuses on the IE step (c) from the marine historical biodiversity data perspective. It orchestrates IE tools and provides the curators with a unified view of the methodology; as a result the documentation of the strengths, limitations and dependencies of several tools was drafted. Additionally, the classification of tools into Graphical User Interface (web and standalone) applications and Command Line Interface ones enables the data curators to select the most suitable tool for their needs, according to their specific features. In addition, the high volume of already digitised marine documents that await curation is amassed and a demonstration of the methodology, with a new scalable, extendable and containerised tool, “DECO” (bioDivErsity data Curation programming wOrkflow) is presented. DECO’s usage will provide a solid basis for future curation initiatives and an augmented degree of reliability towards high value data products that allow for the connection between the past and the present, in marine biodiversity research. |
2021 |
Chatzinikolaou, Eva; Damianidis, Panagiotis; Pavloudi, Christina; Vasileiadou, Aikaterini; Faulwetter, Sarah; Keklikoglou, Kleoniki; Plaitis, Wanda; Mavraki, Dimitra; Nikolopoulou, Stamatina; Arvanitidis, Christos Benthic communities in three Mediterranean touristic ports: MAPMED project Journal Article Biodiversity Data Journal, 9 , pp. e66420, 2021, ISSN: 1314-2828. @article{chatzinikolaou_benthic_2021, title = {Benthic communities in three Mediterranean touristic ports: MAPMED project}, author = {Eva Chatzinikolaou and Panagiotis Damianidis and Christina Pavloudi and Aikaterini Vasileiadou and Sarah Faulwetter and Kleoniki Keklikoglou and Wanda Plaitis and Dimitra Mavraki and Stamatina Nikolopoulou and Christos Arvanitidis}, url = {https://bdj.pensoft.net/article/66420/ https://imbbc.hcmr.gr/wp-content/uploads/2021/05/2021-Chatzinikolaou-DioDiv-Data-J-32.pdf}, doi = {10.3897/BDJ.9.e66420}, issn = {1314-2828}, year = {2021}, date = {2021-04-01}, urldate = {2021-04-27}, journal = {Biodiversity Data Journal}, volume = {9}, pages = {e66420}, abstract = {Mediterranean ports are sources of significant economic activity and at the same time they act as recipients of considerable anthropogenic disturbance and pollution. Macrobenthic communities are an important component of the port biota and have been used as environmental quality indicators.Macrobenthic assemblages were recorded in three Mediterranean touristic ports under the framework of the ENPI CBC MED project MAPMED. Samples were collected from Cagliari (Sardinia, Italy), Heraklion (Crete, Greece) and El Kantaoui (Tunisia) ports during February, May and September 2012. The sampling stations were selected according to the different sectors within each port (i.e. leisure, fishing, passenger/cargo vessels, shipyard). A total number of 277 taxa belonging to 12 phyla were found, of which the 96 taxa were found in all three ports. El Kantaoui port hosted the highest number of macrobenthic taxa. Mollusca were the most abundant group (34%) in all ports. The highest percentage of opportunistic taxa per station was found before the touristic period in the shipyard of Heraklion port (89.3%).}, keywords = {}, pubstate = {published}, tppubtype = {article} } Mediterranean ports are sources of significant economic activity and at the same time they act as recipients of considerable anthropogenic disturbance and pollution. Macrobenthic communities are an important component of the port biota and have been used as environmental quality indicators.Macrobenthic assemblages were recorded in three Mediterranean touristic ports under the framework of the ENPI CBC MED project MAPMED. Samples were collected from Cagliari (Sardinia, Italy), Heraklion (Crete, Greece) and El Kantaoui (Tunisia) ports during February, May and September 2012. The sampling stations were selected according to the different sectors within each port (i.e. leisure, fishing, passenger/cargo vessels, shipyard). A total number of 277 taxa belonging to 12 phyla were found, of which the 96 taxa were found in all three ports. El Kantaoui port hosted the highest number of macrobenthic taxa. Mollusca were the most abundant group (34%) in all ports. The highest percentage of opportunistic taxa per station was found before the touristic period in the shipyard of Heraklion port (89.3%). |
2016 |
Faulwetter, Sarah; Pafilis, Evangelos; Fanini, Lucia; Bailly, Nicolas; Agosti, Donat; Arvanitidis, Christos; Boicenco, Laura; Capatano, Terry; Claus, Simon; Dekeyzer, Stefanie; Georgiev, Teodor; Legaki, Aglaia; Mavraki, Dimitra; Oulas, Anastasis; Papastefanou, Gabriella; Penev, Lyubomir; Sautter, Guido; Schigel, Dmitry; Senderov, Viktor; Teaca, Adrian; Tsompanou, Marilena EMODnet Workshop on mechanisms and guidelines to mobilise historical data into biogeographic databases Journal Article Research Ideas and Outcomes, 2 , pp. e9774, 2016, ISSN: 2367-7163. @article{faulwetter_emodnet_2016, title = {EMODnet Workshop on mechanisms and guidelines to mobilise historical data into biogeographic databases}, author = {Sarah Faulwetter and Evangelos Pafilis and Lucia Fanini and Nicolas Bailly and Donat Agosti and Christos Arvanitidis and Laura Boicenco and Terry Capatano and Simon Claus and Stefanie Dekeyzer and Teodor Georgiev and Aglaia Legaki and Dimitra Mavraki and Anastasis Oulas and Gabriella Papastefanou and Lyubomir Penev and Guido Sautter and Dmitry Schigel and Viktor Senderov and Adrian Teaca and Marilena Tsompanou}, url = {http://rio.pensoft.net/articles.php?id=9774}, doi = {10.3897/rio.2.e9774}, issn = {2367-7163}, year = {2016}, date = {2016-07-01}, urldate = {2020-09-21}, journal = {Research Ideas and Outcomes}, volume = {2}, pages = {e9774}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Mavraki, D; Fanini, L; Tsompanou, M; Gerovasileiou, V; Nikolopoulou, S; Chatzinikolaou, E; Plaitis, W; Faulwetter, S Rescuing biogeographic legacy data: The "Thor" Expedition, a historical oceanographic expedition to the Mediterranean Sea Journal Article Biodiversity Data Journal, 4 (1), 2016, ISSN: 13142828, (Publisher: Pensoft Publishers). @article{mavraki_rescuing_2016, title = {Rescuing biogeographic legacy data: The "Thor" Expedition, a historical oceanographic expedition to the Mediterranean Sea}, author = {D Mavraki and L Fanini and M Tsompanou and V Gerovasileiou and S Nikolopoulou and E Chatzinikolaou and W Plaitis and S Faulwetter}, url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85018653928&doi=10.3897%2fBDJ.4.e11054&partnerID=40&md5=d548ea13a533ece32aee66df08954a37}, doi = {10.3897/BDJ.4.e11054}, issn = {13142828}, year = {2016}, date = {2016-01-01}, journal = {Biodiversity Data Journal}, volume = {4}, number = {1}, abstract = {Background This article describes the digitization of a series of historical datasets based ?n the reports of the 1908-1910 Danish Oceanographical Expeditions to the Mediterranean and adjacent seas. All station and sampling metadata as well as biodiversity data regarding calcareous rhodophytes, pelagic polychaetes, and fish (families Engraulidae and Clupeidae) obtained during these expeditions were digitized within the activities of the LifeWatchGreece Research ?nfrastructure project and presented in the present paper. The aim was to safeguard public data availability by using an open access infrastructure, and to prevent potential loss of valuable historical data on the Mediterranean marine biodiversity. New information The datasets digitized here cover 2,043 samples taken at 567 stations during a time period from 1904 to 1930 in the Mediterranean and adjacent seas. The samples resulted in 1,588 occurrence records of pelagic polychaetes, fish (Clupeiformes) and calcareous algae (Rhodophyta). In addition, basic environmental data (e.g. sea surface temperature, salinity) as well as meterological conditions are included for most sampling events. In addition to the description of the digitized datasets, a detailed description of the problems encountered during the digitization of this historical dataset and a discussion on the value of such data are provided. © Mavraki D et al.}, note = {Publisher: Pensoft Publishers}, keywords = {}, pubstate = {published}, tppubtype = {article} } Background This article describes the digitization of a series of historical datasets based ?n the reports of the 1908-1910 Danish Oceanographical Expeditions to the Mediterranean and adjacent seas. All station and sampling metadata as well as biodiversity data regarding calcareous rhodophytes, pelagic polychaetes, and fish (families Engraulidae and Clupeidae) obtained during these expeditions were digitized within the activities of the LifeWatchGreece Research ?nfrastructure project and presented in the present paper. The aim was to safeguard public data availability by using an open access infrastructure, and to prevent potential loss of valuable historical data on the Mediterranean marine biodiversity. New information The datasets digitized here cover 2,043 samples taken at 567 stations during a time period from 1904 to 1930 in the Mediterranean and adjacent seas. The samples resulted in 1,588 occurrence records of pelagic polychaetes, fish (Clupeiformes) and calcareous algae (Rhodophyta). In addition, basic environmental data (e.g. sea surface temperature, salinity) as well as meterological conditions are included for most sampling events. In addition to the description of the digitized datasets, a detailed description of the problems encountered during the digitization of this historical dataset and a discussion on the value of such data are provided. © Mavraki D et al. |
Chatzinikolaou, E; Faulwetter, S; Mavraki, D; Bourtzis, T; Arvanitidis, C Data policy and data sharing agreement in the LifeWatchGreece Research Infrastructure Journal Article Biodiversity Data Journal, 4 (1), 2016, ISSN: 13142828, (Publisher: Pensoft Publishers). @article{chatzinikolaou_data_2016, title = {Data policy and data sharing agreement in the LifeWatchGreece Research Infrastructure}, author = {E Chatzinikolaou and S Faulwetter and D Mavraki and T Bourtzis and C Arvanitidis}, url = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85018657276&doi=10.3897%2fBDJ.4.e10849&partnerID=40&md5=ba19fabe3629875c6bf9bab0e6adf07d}, doi = {10.3897/BDJ.4.e10849}, issn = {13142828}, year = {2016}, date = {2016-01-01}, journal = {Biodiversity Data Journal}, volume = {4}, number = {1}, abstract = {The LifeWatchGreece Research Infrastructure (LWG RI) stores biodiversity data and information from all biology-related disciplines derived from the Greek territory (or the Mediterranean Sea for the marine data). The aim of LWG RI is to facilitate data sharing and dissemination under harmonised standards in order to maximize the socio-economic benefits of research and knowledge transfer to the public. This publication describes the rationale behind the data policy of LWG RI, outlines the current legal situation for sharing research data and presents the Data Sharing Agreement which is signed between the data owner/provider and the LWG RI for each dataset, describing in detail the rights and duties of each party, as well as the license type and the embargo period under which the data are released. © Chatzinikolaou E et al.}, note = {Publisher: Pensoft Publishers}, keywords = {}, pubstate = {published}, tppubtype = {article} } The LifeWatchGreece Research Infrastructure (LWG RI) stores biodiversity data and information from all biology-related disciplines derived from the Greek territory (or the Mediterranean Sea for the marine data). The aim of LWG RI is to facilitate data sharing and dissemination under harmonised standards in order to maximize the socio-economic benefits of research and knowledge transfer to the public. This publication describes the rationale behind the data policy of LWG RI, outlines the current legal situation for sharing research data and presents the Data Sharing Agreement which is signed between the data owner/provider and the LWG RI for each dataset, describing in detail the rights and duties of each party, as well as the license type and the embargo period under which the data are released. © Chatzinikolaou E et al. |
Dimitra Mavraki
2022 |
Automating the Curation Process of Historical Literature on Marine Biodiversity Using Text Mining: The DECO Workflow Journal Article Frontiers in Marine Science, 9 , pp. 940844, 2022, ISSN: 2296-7745. |
2021 |
Benthic communities in three Mediterranean touristic ports: MAPMED project Journal Article Biodiversity Data Journal, 9 , pp. e66420, 2021, ISSN: 1314-2828. |
2016 |
EMODnet Workshop on mechanisms and guidelines to mobilise historical data into biogeographic databases Journal Article Research Ideas and Outcomes, 2 , pp. e9774, 2016, ISSN: 2367-7163. |
Rescuing biogeographic legacy data: The "Thor" Expedition, a historical oceanographic expedition to the Mediterranean Sea Journal Article Biodiversity Data Journal, 4 (1), 2016, ISSN: 13142828, (Publisher: Pensoft Publishers). |
Data policy and data sharing agreement in the LifeWatchGreece Research Infrastructure Journal Article Biodiversity Data Journal, 4 (1), 2016, ISSN: 13142828, (Publisher: Pensoft Publishers). |