Almost three years after SARS-CoV-2 emerged, we still don’t know where the virus that causes COVID-19 came from.
The location of the initial outbreak near the Wuhan Institute of Virology raised suspicions that it may have been a lab leak. But scientists have largely come out in favor of a natural spillover from bats to humans, via an intermediate animal host, at the Huanan Seafood Market a few miles away. To date, however, no immediate ancestors of SARS-CoV-2 have been found in bats or any other animals for sale on the market.
A recent preprint (a study that has not yet been peer-reviewed) claims to have identified possibly unusual sequence patterns in the SARS-CoV-2 genome. These patterns may indicate that the virus has been genetically modified in a laboratory.
It should be emphasized that any realistic scenario of laboratory origin would indicate an accidental escape, not nefarious intent. Viruses have no application as biological weapons in the modern world. They are difficult to mass produce and deploy. They take days to be effective, and if capable of human-to-human transmission, they are likely to spread to unintended populations, including friendly forces.
The preprint was poorly received by most experts in the field, with many reacting to it on social media.
This mixed reception is largely unsurprising. Scientists and members of the general public often have strong opinions about the origin of SARS-CoV-2, despite all the available evidence which remains weak and circumstantial. In the absence of solid facts, opinions are bound to be largely based on emotions and group affiliation, especially when the stakes are considered so high.
Learn more about science
The genomes of all organisms, including SARS-CoV-2, are formed from long stretches of four different nucleotides (A, T, G, and C). These are the building blocks of RNA and DNA.
Large viral genomes, such as those of coronaviruses, can be cut into smaller pieces, or fragments, which can be mixed and matched to study the effect of different genes and mutations. Scientists could do this, for example, to understand which genes or mutations might increase the risk of a virus spreading to humans.
The standard method for cutting viral genomes into smaller pieces is to use restriction enzymes, sometimes called molecular scissors. Restriction enzymes recognize and cut specific nucleotide sequences (eg, GAATTC). Of approximately 3,000 different restriction enzymes, only a relatively small number are commonly used to manipulate viral genomes. Among these are type IIS enzymes.
Read more: Why it will soon be too late to know where the COVID-19 virus came from
The preprint claims that in the SARS-CoV-2 genome, the distribution of certain restriction sites (the places where the genome may have been cut and joined) is “abnormal” and consistent with the virus having been stitched together from several smaller fragments using type IIS enzymes called BsaI and BsmBI.
Notably, the restriction sites exhibited an excess of silent mutations. These are nucleotide changes that do not affect the characteristics of the virus and may be hallmarks of genetic engineering.
When slicing and assembling genomes using IIS enzymes, scientists can seamlessly erase all restriction site imprints through a method called “golden gate assembly.”
Thus, for the distribution of type IIS enzymes in SARS-CoV-2 to be interpreted as an engineered signature, the IIS restriction sites should have been left intentionally. While not entirely implausible, it’s not common practice, and scientists have wondered what the reason would be to leave these sites behind.
Questions have also been raised around some of the mathematical measurements on which the authors’ conclusions are based, in particular the presumed maximum length of individual viral fragments. Meanwhile, the analysis was criticized for only considering the two type IIS restriction enzymes commonly used in this context.
All of these highly technical points of contention illustrate the difficulty of formulating satisfactory and testable hypotheses for complex questions.
What are the chances?
The study also explored how easily the restriction site distribution pattern seen in SARS-CoV-2 could be generated by chance (as opposed to engineering). The researchers simulated a process of random mutations from two close relatives of SARS-CoV-2. The probability of generating the same pattern was low – 0.1% and 1.2%.
Again, this analysis has been criticized. Coronaviruses can naturally gain and lose restriction patterns by accumulating mutations, but also through different viral strains exchanging genetic material, a process called genetic recombination.
As coronaviruses undergo frequent genetic recombinations, a simulation process using a mixture of recombination and mutation events may arguably be better suited to answer this question.
This criticism is fair, but partly overlooks the fact that unusual patterns can be informative even if the process that generated them remains unknown. A single black sheep in a herd of 1,000 stands out, whether its coat color is caused by an unusual genetic makeup or because it fell into a tar barrel.
Read more: Coronavirus – a brief history
The evidence reported in the preprint is neither conclusive nor definitive. These results may turn out to be a coincidence or generated by a flaw in the method. The authors were broadly open about some limitations of their work and invited comments and criticism.
Although the results can be replicated by others and remain valid once additional data has been analyzed, this study is unlikely to influence many opinions. At best – or worse, depending on previous beliefs – these results will add only a grain of weak and circumstantial additional evidence to the debate.
The reception of the work raises difficult questions. Some experts believe it is unwise to discuss any evidence supporting a lab leak, as it could fuel conspiracy theories. However, a public perception that existing evidence may be subject to censorship is even more likely to have this effect. Notably, China has been largely uncooperative in investigating the origin of the virus.
The nightmare scenario for me would not be the eventual confirmation of an accidental lab leak, but the confirmation of a lab leak whose evidence has been aggressively suppressed.
This article is republished from The Conversation under a Creative Commons license. Read the original article.
François Balloux receives funding from the Wellcome Trust, the NIHR and the EU 2020 framework programme.