Home / Chroniques / Metagenomics: a new way to study biodiversity at the microscopic level
π Health and biotech π Science and technology

Metagenomics: a new way to study biodiversity at the microscopic level

Tania Louis
Tania Louis
PhD in biology and Columnist at Polytechnique Insights
Key takeaways
  • Metagenomics is a technique that combines molecular biology and computer science to study the entire microbial world.
  • Genomes can thus be analysed at the level of an entire sample to characterise complete ecosystems.
  • A metagenomic study is carried out in two main stages: sample recovery and sequencing.
  • Sequencing genomes requires a great deal of expertise in bioinformatics, which makes the development of metagenomics inseparable from the development of Big Data.
  • Although metagenomics is costly and cumbersome to set up, it is nevertheless promising and allows us to discover a still unknown microscopic biodiversity.

Their small size makes it dif­fi­cult to per­ceive, but micro-organ­isms are by far the most numer­ous enti­ties on our plan­et. Bac­te­ria, archaea, virus­es, fun­gi, and oth­er tiny eukary­otes are present almost every­where and form ecosys­tems that escape both our eyes and our test tubes, since it is esti­mat­ed that only 1–2% of micro-organ­isms are eas­i­ly cul­ti­vat­ed in the lab­o­ra­to­ry. How­ev­er, it is now pos­si­ble to study the entire micro­bial world thanks to a tech­nique that com­bines mol­e­c­u­lar biol­o­gy and com­put­er sci­ence: metagenomics.

Deconstructing genomes

As this term – coined in 19981 – indi­cates, the gen­er­al idea is to analyse genomes at the lev­el of an entire sam­ple, con­trary to the lev­el of an indi­vid­ual or a species as was pre­vi­ous­ly the case. This gives access to all the microor­gan­isms it con­tains, includ­ing those that we do not know how to grow in cul­ture, and to char­ac­terise com­plete ecosys­tems. How­ev­er, although recent tech­no­log­i­cal advances have made metage­nomics a fast-grow­ing approach, its imple­men­ta­tion remains complex.

Let’s take a step back to put things in per­spec­tive. The first genome to be sequenced, in 1977, was that of a bac­te­rio­phage virus, which mea­sured about 5,300 nucleotides2. Bac­te­ria3 and yeast4 fol­lowed, and final­ly the human genome: pub­lished in the ear­ly 2000s, it took hun­dreds of mil­lions of euros and years of work to deci­pher most of its 3 bil­lion nucleotides5. The first tru­ly com­plete sequence of a human genome was only pub­lished in April 20226!

Sequenc­ing is there­fore a rel­a­tive­ly recent tech­nique that is con­stant­ly improv­ing… So much so that it is now pos­si­ble to sequence a human genome with sat­is­fac­to­ry qual­i­ty for only €1,000, in a sin­gle day. There are in fact dif­fer­ent so-called ‘next-gen­er­a­tion’ sequenc­ing tech­niques, vary­ing in accu­ra­cy, speed and cost, and it is now pos­si­ble to recov­er mil­lions or even bil­lions of sequences in par­al­lel to analyse tens of bil­lions of nucleotides every day. This is the first tech­no­log­i­cal advance that allows the simul­ta­ne­ous study of the genomes of com­mu­ni­ties of micro-organ­isms… But it is not the only one.

In fact, sequenc­ing many nucleotides leads to the recov­ery of a lot of dig­i­tal data, which must then be processed. The devel­op­ment of metage­nomics is there­fore tak­ing place in par­al­lel with that of “Big Data”. Stor­age, cal­cu­la­tion capac­i­ties, devel­op­ment of tools or data­base man­age­ment: mak­ing genomes talk requires equip­ment and sol­id skills in bioinformatics.

Metage­nomics is there­fore at the cross­roads of two rapid­ly evolv­ing fields, and its poten­tial con­tin­ues to increase. It may be tempt­ing to see it as the new Holy Grail of micro­bi­ol­o­gy, allow­ing us to dis­cov­er a micro­scop­ic world that has so far elud­ed us. How­ev­er, this approach remains cum­ber­some, cost­ly, and fraught with pit­falls. Before using it, it is best to have a well-defined ques­tion to answer and to refine the pro­to­col to avoid being buried under a heap of unus­able data.

Exam­ple of a flow cell used for mas­sive­ly par­al­lel sequenc­ing: thou­sands of pieces of DNA are attached to the cell and sequenced simul­ta­ne­ous­ly. Pho­to by Eplisterra.

Metagenomics step-by-step

The first step in a metage­nom­ic study is to col­lect sam­ples. Whether we are inter­est­ed in the micro-organ­isms found in soil, water or human micro­bio­ta, we need to work on sam­ples that are adapt­ed to the ques­tion we are ask­ing, that are com­pa­ra­ble (the com­po­si­tion of the soil will not be the same in dif­fer­ent places, at dif­fer­ent depths or dur­ing dif­fer­ent sea­sons, for exam­ple), that are suf­fi­cient­ly numer­ous and diverse to be rep­re­sen­ta­tive, and that are suf­fi­cient­ly large to be able to recov­er the quan­ti­ties of DNA nec­es­sary for the rest of the protocol.

Dif­fer­ent process­es can be used for this extrac­tion, the pro­to­col of which is opti­mised accord­ing to the medi­um of ori­gin, the types of organ­isms of inter­est and the mate­r­i­al to be recov­ered. In fact, the prepa­ra­tion of the sam­ple is the oppor­tu­ni­ty to sort the organ­isms stud­ied (for exam­ple by fil­ter­ing to keep only those of a cer­tain size) and to select the type of nucle­ic acids that will be sequenced lat­er. In par­tic­u­lar, it is pos­si­ble to puri­fy mes­sen­ger RNA (mRNA) rather than genom­ic DNA to analyse the actu­al activ­i­ty of a micro­bial com­mu­ni­ty: this is known as meta­tran­scrip­tomics rather than metagenomics.

Next comes the sequenc­ing stage, with two pos­si­ble approach­es: tar­get­ed or glob­al metage­nomics. Tar­get­ed metage­nomics is main­ly used to iden­ti­fy and clas­si­fy the species present in a sam­ple. In this case, only cer­tain parts of the genomes, con­sid­ered spe­cif­ic to a par­tic­u­lar type of organ­ism or range of func­tions, are ampli­fied, sequenced, and analysed. Glob­al metage­nomics, on the oth­er hand, enables the fine char­ac­ter­i­sa­tion of com­mu­ni­ties of micro-organ­isms, but is more cum­ber­some to imple­ment. It con­sists of recov­er­ing all the DNA con­tained in a sam­ple, frag­ment­ing it to obtain pieces short enough to be sequenced, sequenc­ing all these por­tions of genomes, and then recon­struct­ing the orig­i­nal genomes as best as possible. 

It is like tak­ing sev­er­al jig­saw puz­zles, shuf­fling all the pieces and then try­ing to put each puz­zle back together.

This is like tak­ing sev­er­al jig­saw puz­zles, shuf­fling all the pieces (with some loss) and then try­ing to put each puz­zle back togeth­er from this dis­parate pile. For organ­isms whose genomes are already record­ed, this is rel­a­tive­ly easy because we have mod­els to fol­low. It is more dif­fi­cult for unknown organ­isms, which may rep­re­sent 90% of some sam­ples7. Tricks have been devised to make it eas­i­er to solve this puz­zle8 [pi_noteBy com­bin­ing dif­fer­ent metage­nom­ic datasets to search for frag­ments with com­pa­ra­ble copy num­bers: https://​www​.ncbi​.nlm​.nih​.gov/​p​m​c​/​a​r​t​i​c​l​e​s​/​P​M​C​4​1​1​1155/[/pi_note], but most of the micro­scop­ic bio­di­ver­si­ty is still unknown to us: metage­nomics is just begin­ning to clear it by mea­sur­ing the extent of our ignorance. 

Metagenomics and bioprospecting

How­ev­er, this approach is not only descrip­tive, but it also opens up new pos­si­bil­i­ties for iden­ti­fy­ing active micro­bial com­pounds. Indeed, after frag­men­ta­tion of the genomes present in a sam­ple, we can pro­duce bac­te­ria each con­tain­ing one of the pieces of DNA obtained and see if any of them acquire inter­est­ing prop­er­ties (recov­er­ing such and such a strain of ener­gy, degrad­ing such and such com­pounds, hav­ing antibi­ot­ic activ­i­ty, etc.). All this with­out grow­ing cul­tures, or even iden­ti­fy­ing, the organ­isms that pos­sessed this skill in the first place!

Beyond fun­da­men­tal research, the func­tion­al side of metage­nomics there­fore broad­ens the field of bio­prospect­ing. This is still metage­nomics, which is cost­ly and cum­ber­some to set up… But it will devel­op as tech­nol­o­gy advances. The exis­tence of direct appli­ca­tions in fields as fun­da­men­tal as med­i­cine and agron­o­my is anoth­er rea­son to fol­low the progress of metage­nomics and the asso­ci­at­ed dis­cov­er­ies in the years to come.

8Look­ing for pat­terns in viral sequences embed­ded in the genomes of oth­er organ­isms that have been sequenced: https://​www​.ncbi​.nlm​.nih​.gov/​p​m​c​/​a​r​t​i​c​l​e​s​/​P​M​C​6​9​6​6834/

Our world explained with science. Every week, in your inbox.

Get the newsletter