Biography

At the DOE Joint Genome Institute, I lead the Viral Genomics group where we explore viruses of microbes and their impacts on ecosystems using (mostly) fancy ‘omics tools. Our current projects include the study of viral diversity and virus:host interactions in soil and freshwater environments, along with the development of new bioinformatics tools and experimental protocols to probe and characterize uncultivated viruses. We also assist users of the JGI Metagenome Program with their analysis, including identification of viral sequences, functional annotation, taxonomic classification, etc.

The long-term goal of my research is to understand the ecological and evolutionary drivers of virus:host dynamics in natural microbial communities. This research involves a mix of experimental and computational approaches spanning from the molecular to the ecosystem scale, trying to address fundamental questions like “how do viruses spread and adapt across environments ?”, “how do viruses take over and reprogram microbial cells ?”, and “how do viral infections alter ecosystem processes ?”.

Interests

  • Microbial/Viral Ecology
  • (Viral) Metagenomics
  • Virus Evolution

Education

  • PhD in Microbial Ecology, 2013

    Université Blaise Pascal, Clermont-Ferrand II (now Université Clermont Auvergne)

  • MSc in Data Analysis and Modeling for Life Sciences, 2010

    Université Blaise Pascal, Clermont-Ferrand II (now Université Clermont Auvergne)

  • BSc in Microbiology, 2008

    Université Blaise Pascal, Clermont-Ferrand II (now Université Clermont Auvergne)

Current Projects

Picture of East River hillslope and Green Butte biocrust - Virus-driven alterations of microbial metabolism in soil
We plan to study viral diversity and virus:host interactions in two model systems (East River hillslopes and Green Butte biocrusts) from the single-cell to the ecosystem level. Project goals include (i) characterizing new mechanisms by which viruses transform microbial cells, (ii) investigating how how virus:host interactions are transformed by changes in local environmental conditions, (iii) developing innovative methods to measure viral presence and activity in natural soil systems. 5-year project funded by the DOE Early Career Research Program, in collaboration with the Brodie Lab and the Northen Lab. More information in our latest pre-print - https://doi.org/10.1101/2023.03.06.531389 (Pictures courtesy of Tamy Swenson & Brodie Lab)

Logo of the iPHoP tool - Computational prediction of virus:host interactions
To help with viromics analysis, we have now developed a tool (iPHoP) to integrate multiple signals of virus-host interactions and enable robust prediction of host genus for many uncultivated phages. Now available on bioconda , and described in the following pre-print: https://doi.org/10.1101/2022.07.28.501908 .

Picture of Mushroom Spring in YNP - Virus:host dynamics in Yellowstone National Park biofilms
We study virus:host dynamics across diel cycles in Octopus and Mushroom springs based on coupled metagenomics, metatranscriptomics, and viral metagenomics, to better understand phage infection triggers and synchronization in natural communities. In collaboration with the Bhaya Lab. (Picture: USGS / Thomas Brock)

Illustration of viral capsids - IMG/VR - Large-scale exploration of uncultivated viral diversity
We routinely mine public genomes, metagenomes, and metatranscriptomes for new viral sequences to progressively build a large and comprehensive genomic catalog of the virosphere. IMG/VR v4 now released ! More into in the tools section (Viral capsids drawing from Leah Pantea / http://leahpantea.com)

MVP pipeline logo - Towards a user-friendly viral ecogenomics toolkit
Developing tools to identify, clean, compare, and annotate uncultivated viral genomes (mostly) assembled from metagenomes. Currently gathered in the MVP pipeline, developed by Clement Coclet. See also the tools section

MVP pipeline logo - Establishing the foundations of a high-throughput phage foundry
Analyzing and modeling phage diversity and phage:host interactions to better understand how microbiomes can be altered and manipulated through the addition of (engineered) phages. Project led by Vivek Mutalik

Tools

Teaching and other Online Resources

  • Viromics workshop The (somewhat) annual workshop dedicated to viromics analysis including viral genome assembly, identification, annotation, curation, and taxonomic classification. Hosted at Ohio State University - https://u.osu.edu/viruslab/viromics-workshop/
  • MGM Workshop Bi-annual highly hands-on workshop designed to familiarize users with the Integrated Microbial Genomes & Microbiomes (IMG/M) data and workflows for computational analysis and interpretation of sequence data, including IMG/VR. https://mgm.jgi.doe.gov
  • VERVE Net Collection of news, protocols, and online discussion for viral ecologists - https://www.protocols.io/groups/verve-net

Selected media coverage

1 / 5
2 / 5
3 / 5
4 / 5
5 / 5

JGI Podcast - Genome Insider
Nature News - Amy Maxmen - 19 March 2018
Wired - Shara Tonn - 03 Sept 2015
Nature Biotechnology - Charles Schmidt - 01 Oct 2018
Comminucations of the ACM - Chris Edwards - Dec 2018

Recent Publications

Viruses interact with hosts that span distantly related microbial domains in dense hydrothermal mats

Many microbes in nature reside in dense, metabolically interdependent communities. We investigated the nature and extent of microbe-virus interactions in relation to microbial density and syntrophy by examining microbe-virus interactions in a biomass dense, deep-sea hydrothermal mat. Using metagenomic sequencing, we find numerous instances where phylogenetically distant (up to domain level) microbes encode CRISPR-based immunity against the same viruses in the mat. Evidence of viral interactions with hosts cross-cutting microbial domains is particularly striking between known syntrophic partners, for example those engaged in anaerobic methanotrophy. These patterns are corroborated by proximity-ligation-based (Hi-C) inference. Surveys of public datasets reveal additional viruses interacting with hosts across domains in diverse ecosystems known to harbour syntrophic biofilms. We propose that the entry of viral particles and/or DNA to non-primary host cells may be a common phenomenon in densely populated ecosystems, with eco-evolutionary implications for syntrophic microbes and CRISPR-mediated inter-population augmentation of resilience against viruses.

You can move, but you can’t hide: identification of mobile genetic elements with geNomad

Identifying and characterizing mobile genetic elements (MGEs) in sequencing data is essential for understanding their diversity, ecology, biotechnological applications, and impact on public health. Here, we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a large dataset of marker proteins to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks that included diverse MGE and chromosome sequences, geNomad significantly outperformed other tools in all evaluated clades of plasmids and viruses. Leveraging geNomad’s speed and scalability, we were able to process public metagenomes and metatranscriptomes, leading to the discovery of millions of new viruses and plasmids that are available through the IMG/VR and IMG/PR databases. We anticipate that geNomad will enable further advancements in MGE research, and it is available at https://portal.nersc.gov/genomad.

Virus diversity and activity is driven by snowmelt and host dynamics in a high-altitude watershed soil ecosystem

Viruses, including phages, impact nearly all organisms on Earth, including microbial communities and their associated biogeochemical processes. In soils, highly diverse viral communities have been identified, with a global distribution seemingly driven by multiple biotic and abiotic factors, especially soil temperature and moisture. However, our current understanding of the stability of soil viral communities across time, and their response to strong seasonal change in environmental parameters remains limited. Here, we investigated the diversity and activity of environmental DNA and RNA viruses, including phages, across dynamics seasonal changes in a snow-dominated mountainous watershed by examining paired metagenomes and metatranscriptomes. We identified a large number of DNA and RNA viruses taxonomically divergent from existing environmental viruses, including a significant proportion of RNA viruses target fungal hosts and a large and unsuspected diversity of positive single-stranded RNA phages (Leviviricetes), highlighting the under-characterization of the global soil virosphere. Among these, we were able to distinguish subsets of active phages which changed across seasons, consistent with a “seed-bank” viral community structure in which new phage activity, for example replication and host lysis, is sequentially triggered by changes in environmental conditions. Zooming in at the population level, we further identified virus-host dynamics matching two existing ecological models: “Kill-The-Winner” which proposes that lytic phages are actively infecting abundant bacteria, and “Piggyback-The-Persistent” which argues that when the host is growing slowly it is more beneficial to remain in a lysogenic state. The former was associated with summer months of high and rapid microbial activity, and the latter to winter months of limited and slow host growth. Taken together, these results suggest that the high diversity of viruses in soils is likely associated with a broad range of host interaction types each adapted to specific host ecological strategies and environmental conditions. Moving forward, while as our understanding of how environmental and host factors drive viral activity in soil ecosystems progresses, integrating these viral impacts in complex natural microbiome models will be key to accurately predict ecosystem biogeochemistry.

IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata

Viruses are widely recognized as critical members of all microbiomes. Metagenomics enables large-scale exploration of the global virosphere, progressively revealing the extensive genomic diversity of viruses on Earth and highlighting the myriad of ways by which viruses impact biological processes. IMG/VR provides access to the largest collection of viral sequences obtained from (meta)genomes, along with functional annotation and rich metadata. A web interface enables users to efficiently browse and search viruses based on genome features and/or sequence similarity. Here, we present the fourth version of IMG/VR, composed of >15 million virus genomes and genome fragments, a ≈6-fold increase in size compared to the previous version. These clustered into 8.7 million viral operational taxonomic units, including 231 408 with at least one high-quality representative. Viral sequences in IMG/VR are now systematically identified from genomes, metagenomes, and metatranscriptomes using a new detection approach (geNomad), and IMG standard annotation are complemented with genome quality estimation using CheckV, taxonomic classification reflecting the latest taxonomic standards, and microbial host taxonomy prediction. IMG/VR v4 is available at https://img.jgi.doe.gov/vr, and the underlying data are available to download at https://genome.jgi.doe.gov/portal/IMG_VR.

Expansion of the global RNA virome reveals diverse clades of bacteriophages

High-throughput RNA sequencing offers broad opportunities to explore the Earth RNA virome. Mining 5,150 diverse metatranscriptomes uncovered >2.5 million RNA virus contigs. Analysis of >330,000 RNA-dependent RNA polymerases (RdRPs) shows that this expansion corresponds to a 5-fold increase of the known RNA virus diversity. Gene content analysis revealed multiple protein domains previously not found in RNA viruses and implicated in virus-host interactions. Extended RdRP phylogeny supports the monophyly of the five established phyla and reveals two putative additional bacteriophage phyla and numerous putative additional classes and orders. The dramatically expanded phylum Lenarviricota, consisting of bacterial and related eukaryotic viruses, now accounts for a third of the RNA virome. Identification of CRISPR spacer matches and bacteriolytic proteins suggests that subsets of picobirnaviruses and partitiviruses, previously associated with eukaryotes, infect prokaryotic hosts.

iPHoP: an integrated machine-learning framework to maximize host prediction for metagenome-assembled virus genomes

The extraordinary diversity of viruses infecting bacteria and archaea is now primarily studied through metagenomics. While metagenomes enable high-throughput exploration of the viral sequence space, metagenome-derived genomes lack key information compared to isolated viruses, in particular host association. Different computational approaches are available to predict the host(s) of uncultivated viruses based on their genome sequences, but thus far individual approaches are limited either in precision or in recall, i.e. for a number of viruses they yield erroneous predictions or no prediction at all. Here we describe iPHoP, a two-step framework that integrates multiple methods to provide host predictions for a broad range of viruses while retaining a low (<10%) false-discovery rate. Based on a large database of metagenome-derived virus genomes, we illustrate how iPHoP can provide extensive host prediction and guide further characterization of uncultivated viruses. iPHoP is available at https://bitbucket.org/srouxjgi/iphop, through a Bioconda recipe, and a Docker container.

Global overview and major challenges of host prediction methods for uncultivated phages

Bacterial communities play critical roles across all of Earth’s biomes, affecting human health and global ecosystem functioning. They do so under strong constraints exerted by viruses, that is, bacteriophages or ‘phages’. Phages can reshape bacterial communities’ structure, influence long-term evolution of bacterial populations, and alter host cell metabolism during infection. Metagenomics approaches, that is, shotgun sequencing of environmental DNA or RNA, recently enabled large-scale exploration of phage genomic diversity, yielding several millions of phage genomes now to be further analyzed and characterized. One major challenge however is the lack of direct host information for these phages. Several methods and tools have been proposed to bioinformatically predict the potential host(s) of uncultivated phages based only on genome sequence information. Here we review these different approaches and highlight their distinct strengths and limitations. We also outline complementary experimental assays which are being proposed to validate and refine these bioinformatic predictions.

Ecology and molecular targets of hypermutation in the global microbiome

Changes in the sequence of an organism’s genome, i.e., mutations, are the raw material of evolution. The frequency and location of mutations can be constrained by specific molecular mechanisms, such as diversity-generating retroelements (DGRs). DGRs have been characterized from cultivated bacteria and bacteriophages, and perform error-prone reverse transcription leading to mutations being introduced in specific target genes. DGR loci were also identified in several metagenomes, but the ecological roles and evolutionary drivers of these DGRs remain poorly understood. Here, we analyze a dataset of >30,000 DGRs from public metagenomes, establish six major lineages of DGRs including three primarily encoded by phages and seemingly used to diversify host attachment proteins, and demonstrate that DGRs are broadly active and responsible for >10% of all amino acid changes in some organisms. Overall, these results highlight the constraints under which DGRs evolve, and elucidate several distinct roles these elements play in natural communities.

Extreme dimensions — how big (or small) can tailed phages be?

This May 2021 Genome Watch highlights the search for unusually large (or small) tailed phages driven by metagenomics.

Host population diversity as a driver of viral infection cycle in wild populations of green sulfur bacteria with long standing virus-host interactions

Viral infections of bacterial hosts range from highly lytic to lysogenic, where highly lytic viruses undergo viral replication and immediately lyse their hosts, and lysogenic viruses have a latency period before replication and host lysis. While both types of infections are routinely observed in the environment, the ecological and evolutionary processes that regulate these different viral dynamics are still not well understood. In this study, we identify and characterize the long-term dynamics of uncultivated viruses infecting green sulfur bacteria (GSB) in a model freshwater lake sampled from 2005-2018. Overall, our data suggest that single GSB populations are typically infected by multiple viruses at the same time, that lytic and lysogenic viruses can readily co-infect the same host population in the same ecosystem, and that host strain-level diversity might be an important factor controlling the lytic/lysogeny switch.

Giant virus diversity and host interactions through global metagenomics

Our current knowledge about nucleocytoplasmic large DNA viruses (NCLDVs) is largely derived from viral isolates that are co-cultivated with protists and algae. Here we reconstructed 2,074 NCLDV genomes from sampling sites across the globe by building on the rapidly increasing amount of publicly available metagenome data, leading to an 11-fold increase in phylogenetic diversity and a parallel 10-fold expansion in functional diversity. We anticipate that the global diversity of NCLDVs that we describe here will establish giant viruses—which are associated with most major eukaryotic lineages—as important players in ecosystems across Earth’s biomes.

Cryptic inoviruses revealed as pervasive in bacteria and archaea across Earth’s biomes

Bacteriophages from the Inoviridae family (inoviruses) are characterized by their unique morphology, genome content and infection cycle. To date, a relatively small number of inovirus isolates have been extensively studied. Here, we show that the current 56 members of the Inoviridae family represent a minute fraction of a highly diverse group of inoviruses. Capturing this previously obscured component of the global virosphere may spark new avenues for microbial manipulation approaches and innovative biotechnological applications.

Minimum Information about an Uncultivated Virus Genome (MIUViG)

We present an extension of the Minimum Information about any (x) Sequence (MIxS) standard for reporting sequences of uncultivated virus genomes. Minimum Information about an Uncultivated Virus Genome (MIUViG) standards were developed within the Genomic Standards Consortium framework and include virus origin, genome quality, genome annotation, taxonomic classification, biogeographic distribution and in silico host prediction. Community-wide adoption of MIUViG standards, should enable more robust comparative studies and a systematic exploration of the global virosphere.

Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses

Ocean microbes drive biogeochemical cycling on a global scale. However, this cycling is constrained by viruses that affect community composition, metabolic activity, and evolutionary trajectories. Here we assemble complete genomes and large genomic fragments from both surface- and deep-ocean viruses sampled during the Tara Oceans and Malaspina research expeditions, and analyse the resulting ‘global ocean virome’ dataset to present a global map of abundant, double-stranded DNA viruses complete with genomic and ecological contexts.

Contact