Home / Chroniques / Metagenomics: a new way to study biodiversity at the microscopic level
tribune05_Tania-Louis_EN
π Health and biotech π Science and technology

Metagenomics: a new way to study biodiversity at the microscopic level

Tania Louis
Tania Louis
PhD in biology and Columnist at Polytechnique Insights
Key takeaways
  • Metagenomics is a technique that combines molecular biology and computer science to study the entire microbial world.
  • Genomes can thus be analysed at the level of an entire sample to characterise complete ecosystems.
  • A metagenomic study is carried out in two main stages: sample recovery and sequencing.
  • Sequencing genomes requires a great deal of expertise in bioinformatics, which makes the development of metagenomics inseparable from the development of Big Data.
  • Although metagenomics is costly and cumbersome to set up, it is nevertheless promising and allows us to discover a still unknown microscopic biodiversity.

Their small size makes it dif­fi­cult to per­ceive, but micro-organ­isms are by far the most numer­ous entit­ies on our plan­et. Bac­teria, archaea, vir­uses, fungi, and oth­er tiny euk­a­ryotes are present almost every­where and form eco­sys­tems that escape both our eyes and our test tubes, since it is estim­ated that only 1–2% of micro-organ­isms are eas­ily cul­tiv­ated in the labor­at­ory. How­ever, it is now pos­sible to study the entire micro­bi­al world thanks to a tech­nique that com­bines molecu­lar bio­logy and com­puter sci­ence: metagenomics.

Deconstructing genomes

As this term – coined in 19981 – indic­ates, the gen­er­al idea is to ana­lyse gen­omes at the level of an entire sample, con­trary to the level of an indi­vidu­al or a spe­cies as was pre­vi­ously the case. This gives access to all the microor­gan­isms it con­tains, includ­ing those that we do not know how to grow in cul­ture, and to char­ac­ter­ise com­plete eco­sys­tems. How­ever, although recent tech­no­lo­gic­al advances have made meta­ge­n­om­ics a fast-grow­ing approach, its imple­ment­a­tion remains complex.

Let’s take a step back to put things in per­spect­ive. The first gen­ome to be sequenced, in 1977, was that of a bac­terio­phage vir­us, which meas­ured about 5,300 nuc­le­otides2. Bac­teria3 and yeast4 fol­lowed, and finally the human gen­ome: pub­lished in the early 2000s, it took hun­dreds of mil­lions of euros and years of work to decipher most of its 3 bil­lion nuc­le­otides5. The first truly com­plete sequence of a human gen­ome was only pub­lished in April 20226!

Sequen­cing is there­fore a rel­at­ively recent tech­nique that is con­stantly improv­ing… So much so that it is now pos­sible to sequence a human gen­ome with sat­is­fact­ory qual­ity for only €1,000, in a single day. There are in fact dif­fer­ent so-called ‘next-gen­er­a­tion’ sequen­cing tech­niques, vary­ing in accur­acy, speed and cost, and it is now pos­sible to recov­er mil­lions or even bil­lions of sequences in par­al­lel to ana­lyse tens of bil­lions of nuc­le­otides every day. This is the first tech­no­lo­gic­al advance that allows the sim­ul­tan­eous study of the gen­omes of com­munit­ies of micro-organ­isms… But it is not the only one.

In fact, sequen­cing many nuc­le­otides leads to the recov­ery of a lot of digit­al data, which must then be pro­cessed. The devel­op­ment of meta­ge­n­om­ics is there­fore tak­ing place in par­al­lel with that of “Big Data”. Stor­age, cal­cu­la­tion capa­cit­ies, devel­op­ment of tools or data­base man­age­ment: mak­ing gen­omes talk requires equip­ment and sol­id skills in bioinformatics.

Meta­ge­n­om­ics is there­fore at the cross­roads of two rap­idly evolving fields, and its poten­tial con­tin­ues to increase. It may be tempt­ing to see it as the new Holy Grail of micro­bi­o­logy, allow­ing us to dis­cov­er a micro­scop­ic world that has so far eluded us. How­ever, this approach remains cum­ber­some, costly, and fraught with pit­falls. Before using it, it is best to have a well-defined ques­tion to answer and to refine the pro­tocol to avoid being bur­ied under a heap of unus­able data.

Example of a flow cell used for massively par­al­lel sequen­cing: thou­sands of pieces of DNA are attached to the cell and sequenced sim­ul­tan­eously. Photo by Eplisterra.

Metagenomics step-by-step

The first step in a meta­ge­n­om­ic study is to col­lect samples. Wheth­er we are inter­ested in the micro-organ­isms found in soil, water or human micro­bi­ota, we need to work on samples that are adap­ted to the ques­tion we are ask­ing, that are com­par­able (the com­pos­i­tion of the soil will not be the same in dif­fer­ent places, at dif­fer­ent depths or dur­ing dif­fer­ent sea­sons, for example), that are suf­fi­ciently numer­ous and diverse to be rep­res­ent­at­ive, and that are suf­fi­ciently large to be able to recov­er the quant­it­ies of DNA neces­sary for the rest of the protocol.

Dif­fer­ent pro­cesses can be used for this extrac­tion, the pro­tocol of which is optim­ised accord­ing to the medi­um of ori­gin, the types of organ­isms of interest and the mater­i­al to be recovered. In fact, the pre­par­a­tion of the sample is the oppor­tun­ity to sort the organ­isms stud­ied (for example by fil­ter­ing to keep only those of a cer­tain size) and to select the type of nuc­le­ic acids that will be sequenced later. In par­tic­u­lar, it is pos­sible to puri­fy mes­sen­ger RNA (mRNA) rather than gen­om­ic DNA to ana­lyse the actu­al activ­ity of a micro­bi­al com­munity: this is known as meta­tran­scrip­tom­ics rather than metagenomics.

Next comes the sequen­cing stage, with two pos­sible approaches: tar­geted or glob­al meta­ge­n­om­ics. Tar­geted meta­ge­n­om­ics is mainly used to identi­fy and clas­si­fy the spe­cies present in a sample. In this case, only cer­tain parts of the gen­omes, con­sidered spe­cif­ic to a par­tic­u­lar type of organ­ism or range of func­tions, are amp­li­fied, sequenced, and ana­lysed. Glob­al meta­ge­n­om­ics, on the oth­er hand, enables the fine char­ac­ter­isa­tion of com­munit­ies of micro-organ­isms, but is more cum­ber­some to imple­ment. It con­sists of recov­er­ing all the DNA con­tained in a sample, frag­ment­ing it to obtain pieces short enough to be sequenced, sequen­cing all these por­tions of gen­omes, and then recon­struct­ing the ori­gin­al gen­omes as best as possible. 

It is like tak­ing sev­er­al jig­saw puzzles, shuff­ling all the pieces and then try­ing to put each puzzle back together.

This is like tak­ing sev­er­al jig­saw puzzles, shuff­ling all the pieces (with some loss) and then try­ing to put each puzzle back togeth­er from this dis­par­ate pile. For organ­isms whose gen­omes are already recor­ded, this is rel­at­ively easy because we have mod­els to fol­low. It is more dif­fi­cult for unknown organ­isms, which may rep­res­ent 90% of some samples7. Tricks have been devised to make it easi­er to solve this puzzle8 [pi_noteBy com­bin­ing dif­fer­ent meta­ge­n­om­ic data­sets to search for frag­ments with com­par­able copy num­bers: https://​www​.ncbi​.nlm​.nih​.gov/​p​m​c​/​a​r​t​i​c​l​e​s​/​P​M​C​4​1​1​1155/[/pi_note], but most of the micro­scop­ic biod­iversity is still unknown to us: meta­ge­n­om­ics is just begin­ning to clear it by meas­ur­ing the extent of our ignorance. 

Metagenomics and bioprospecting

How­ever, this approach is not only descript­ive, but it also opens up new pos­sib­il­it­ies for identi­fy­ing act­ive micro­bi­al com­pounds. Indeed, after frag­ment­a­tion of the gen­omes present in a sample, we can pro­duce bac­teria each con­tain­ing one of the pieces of DNA obtained and see if any of them acquire inter­est­ing prop­er­ties (recov­er­ing such and such a strain of energy, degrad­ing such and such com­pounds, hav­ing anti­bi­ot­ic activ­ity, etc.). All this without grow­ing cul­tures, or even identi­fy­ing, the organ­isms that pos­sessed this skill in the first place!

Bey­ond fun­da­ment­al research, the func­tion­al side of meta­ge­n­om­ics there­fore broadens the field of biopro­spect­ing. This is still meta­ge­n­om­ics, which is costly and cum­ber­some to set up… But it will devel­op as tech­no­logy advances. The exist­ence of dir­ect applic­a­tions in fields as fun­da­ment­al as medi­cine and agro­nomy is anoth­er reas­on to fol­low the pro­gress of meta­ge­n­om­ics and the asso­ci­ated dis­cov­er­ies in the years to come.

1https://www.cell.com/cell-chemical-biology/pdf/S1074-5521(98)90108–9.pdf
2https://​www​.nature​.com/​a​r​t​i​c​l​e​s​/​2​6​5​687a0
3https://​pubmed​.ncbi​.nlm​.nih​.gov/​7​5​4​2800/
4https://​pubmed​.ncbi​.nlm​.nih​.gov/​8​8​4​9441/
5https://​www​.gen​ome​.gov/​h​u​m​a​n​-​g​e​n​o​m​e​-​p​r​oject
6https://​www​.medecin​es​ci​ences​.org/​e​n​/​a​r​t​i​c​l​e​s​/​m​e​d​s​c​i​/​f​u​l​l​_​h​t​m​l​/​2​0​2​2​/​0​6​/​m​s​c​2​2​0​1​0​4​/​m​s​c​2​2​0​1​0​4​.html
7https://​www​.sci​en​ce​dir​ect​.com/​s​c​i​e​n​c​e​/​a​r​t​i​c​l​e​/​a​b​s​/​p​i​i​/​S​0​1​6​8​1​7​0​2​1​6​3​08012
8Look­ing for pat­terns in vir­al sequences embed­ded in the gen­omes of oth­er organ­isms that have been sequenced: https://​www​.ncbi​.nlm​.nih​.gov/​p​m​c​/​a​r​t​i​c​l​e​s​/​P​M​C​6​9​6​6834/

Support accurate information rooted in the scientific method.

Donate