Turning herbarium records into big data for plant microbiomes

Scientists are now discovering that the millions of biological collections kept behind the scenes in museums and herbaria can serve as a roadmap for understanding biological responses to global change. In this project, we propose to pull data from herbarium specimens to uncover the natural history of microbial diversity, which are not only crucial to plants for healthy growth and protection from herbivores, but, in turn, are useful as sources of new medicine in pharmaceutical industries.

Relevant Publications:

Daru, B.H., Bowman, E.A., Pfister, D.H. & Arnold, A.E. (2018) A novel proof-of-concept for capturing the diversity of endophytic fungi preserved in herbarium specimens. Philosophical Transactions of the Royal Society B 374: 20170395 doi: 10.1098/rstb.2017.0395.

phyloregion, computational infrastructure for biogeographic regionalization and macroecology in the R computing environment

Establishing geographical comparisons based on shared biota is crucial to the study of biogeography and for managing biological diversity in the face of rapid warming of the Earth’s climate. However, the computational tools to analyze and manipulate the massive-scale species biogeography data has not been fully developed. The R software package phyloregion – designed for biogeographic regionalization and macroecology – can overcome these computational challenges. It contains tools for biogeographical regionalization, macroecology, conservation, and visualization, and has potential application in various disciplines including evolution, microbial diversity, systematics, ecology, phylogenetics, and many others. In this project, we plan to substantially increase computational efficiency of functions in phyloregion, to add new functionality, and create a model for user-guided software development in biogeography.

Grade of Membership Model
Figure 1. Workflow to identify biogeographic regions using a Grade of Membership model. (A) Community matrix (optionally paired with phylogeny or functional traits) is (B) analyzed using a Grade of Membership Model which allows the units of analysis (i.e., species or regions) to have partial membership in multiple clusters. (C) This generates values of omega (cluster contribution per grid cell) and theta (species’ contribution per cluster) as outputs. (D) Biogeographic regions are visualized using piecharts that represent the proportional contribution of species to grid cells.

The project will accomplish the following:

  1. Develop and implement new tools in phyloregion for biome evolution and biogeographical investigations;
  2. Develop new tools in phyloregion to visualize patterns of biogeography, macroecology and evolution; and
  3. Develop new tools in phyloregion for conservation that reflect the key dimensions of phylogenetic diversity including richness, divergence and regularity.
  4. The phyloregion project can be accessed via CRAN and https://phyloregion.com/.

GreenMaps, a Tool for Addressing the Wallacean Shortfall in the Global Distribution of Plants

The exponential growth of plant occurrence data can facilitate dynamic biodiversity analyses. However, raw plant diversity data alone should not be used indiscriminately due to inherent sampling biases, impediments that contribute to the Wallacean shortfall (i.e. the paucity of species’ geographic information). Here, we propose GreenMaps, a new tool that will permit a rapid initial assessment of the Wallacean shortfall for plants by building base maps of species’ predicted distributions upon which citizen science participation could contribute to spatial validation of the actual range occupied by species. The initial stages of GreenMaps have now been accomplished, providing a massive dataset of modeled range maps for over 230,000 vascular plant species. Ultimately, GreenMaps will interface with a mobile application to enable volunteers from any region of the world to validate predicted species distributions to be used for the generation of new and improved global maps of plant distributions at scales relevant to research and conservation.