Nation/World

Genetic analysis shows how a single ‘superspreading’ event sent coronavirus across the nation

None of the biotech executives at the meeting noticed the uninvited guest. They had flown to Boston from across the globe for the annual leadership meeting of the drug company Biogen, and they were busy catching up with colleagues and hobnobbing with upper management. For two days they shook hands, kissed cheeks, passed each other the salad tongs at the hotel buffet, never realizing that one among their number carried the coronavirus in their lungs.

By the meeting's end on Feb. 27, the infection had infiltrated many more people: a research director, a photographer, the general manager for the company's east division. They took the virus home with them to the Boston suburbs, Indiana and North Carolina, to Slovakia, Australia and Singapore.

Over the following two weeks, the virus that circulated among conference attendees was implicated in at least 35 new cases. In April, the same distinctive viral sub-strain swirled through two Boston homeless shelters, where it infected 122 residents.

Scientists know all this thanks to a mistake made during the coronavirus's replication process - a simple switch of two letters in the virus's 30,000-character genetic code. This mutation appeared in two elderly patients in France at almost exactly the same time that genetically matching viruses were sickening dozens of people at the Biogen meeting. After the conference, each time the infection spread, the mutation spread with it.

Now, a sweeping study of nearly 800 coronavirus genomes, conducted by no less than 54 researchers at the Broad Institute, Massachusetts General Hospital, the Massachusetts Department of Public Health and several other institutions in the state, has found that viruses carrying the conference’s characteristic mutation infected hundreds of people in the Boston area, as well as victims from Alaska to Senegal to Luxembourg. As of mid-July, the variant had been found in about one-third of the cases sequenced in Massachusetts and 3 percent of all genomes studied thus far in the United States.

The study, which was added Tuesday to the preprint website MedRxiv, is probably the largest genomic analysis of any U.S. outbreak so far and is among the most detailed looks at how coronavirus cases exploded in the pandemic’s first wave.

It documents the cost of the world's naivete this spring, when people traveling for events like the Biogen conference unwittingly imported the virus into Massachusetts dozens of times. It reveals the connections between seemingly disparate communities, showing how an outbreak at a gathering of wealthy executives was only a few infections removed from sickening some of Boston's most vulnerable residents. It highlights the outsize role of indoor "superspreading events" in accelerating and sustaining transmission. With genetic data, said co-author Bronwyn MacInnis, "a record of our poor decisions is being captured in a whole new way."

ADVERTISEMENT

Although the study must undergo the rigors of peer review before it is published in a scientific journal, both outside experts and the scientists involved say it shows the power and promise of an emerging field of research known as genomic epidemiology. The small mutations that accumulate in a virus's genome are like genetic bar codes; by tracking them, researchers can trace infections to their sources and develop more effective interventions to stop the disease.

"This is the kind of study that . . . defines why genomics can be so useful in outbreak reconstruction," said Vaughn Cooper, a microbiologist at the University of Pittsburgh who was not involved in the Boston research. "It reflects a great deal of coordinating work, and that's what in part makes this so powerful."

But if the new research shows the powerful potential of genomic surveillance to unveil the path of the virus through communities, it’s also an exception in terms of the large volume of data it contains. In the United States, such sophisticated genetic tracking has been “patchy, typically passive, reactive, uncoordinated, and underfunded,” experts at the National Academies of Sciences, Engineering and Medicine wrote in a lengthy report last month. Advocates for the cutting-edge technique say more coordinated and comprehensive sequencing efforts could dramatically improve contact tracing and infection control.

As the nation flounders ahead of a second wave of infections, the study serves as both a portent and an opportunity, MacInnis said. The virus's genome may continue to record the consequences of the nation's failures - the too-large gatherings and too-fast reopenings, the testing shortages and lack of protective equipment, and the silent spread.

Or it may answer lingering questions about how the virus is transmitted. It may provide the insights that finally allow workplaces to reopen and schools to safely resume. The virus's own genetic instruction manual may be "invaluable" for teaching us to control the pandemic, MacInnis said - but only if we are willing to heed its lessons.

The anatomy of an outbreak

On the day the Biogen meeting was set to begin, 15 cases of covid-19 had been diagnosed within the United States, nearly all of them among travelers or their close contacts. The Centers for Disease Control and Prevention had just acknowledged an instance of possible “community spread” - an infection without an obvious source. Vice President Pence was to lead a coronavirus task force, and President Trump declared that the risk to Americans was “very low.”

Just like organizers of Mardi Gras in New Orleans, the Democratic primary in South Carolina, and the U.S. Mixed Doubles Curling Championship in Bemidji, Minn. - all of which were held the same week - those coordinating the Biogen conference saw no reason to change plans.

In a statement to The Washington Post, a Biogen spokeswoman pointed out that the company was following all U.S. guidelines at the time and notified health officials as soon as it realized attendees had gotten sick.

"February 2020 was nearly a half year ago, and was a period when general knowledge about the coronavirus was limited," said Anna Robinson, the company's head of U.S. media relations. "We never would have knowingly put anyone at risk."

Biogen has since announced a collaboration with the Broad Institute and Partners HealthCare to compile biological data that could help battle the disease.

The analysis of virus sequences shows that the coronavirus was introduced into Boston and its surrounding area more than 80 separate times by international and domestic travelers - most of whom were probably unaware of the germs they carried.

"We didn't know better," said Jacob Lemieux, a physician and infectious-disease expert at Massachusetts General Hospital and lead author of the study. "The difference now is there is increasing scientific evidence to show what can happen from a single event like that. We do know better. So we need to learn the lesson."

The Boston event opened with breakfast at the Marriott Long Wharf hotel's ballroom overlooking the wintry, gray harbor. Roughly 175 people were there, including guests from Italy, where officials had recently locked down more than a dozen towns in an effort to contain the country's 400 cases.

Everything that felt so normal about the meeting seems sinister in retrospect, said Lara Woolfson, a Boston-based photographer who had been hired to document the conference. In a Facebook live video posted in March, Woolfson reflected on all the doorknobs she’d touched, the strangers she’d sat beside.

In the days that followed, dozens of attendees developed flu-like symptoms, according to the Boston Globe. By March 4, the company was instructing everyone who’d gone to the meeting to self-quarantine. The next day, Biogen confirmed that three out-of-state attendees had been diagnosed with covid-19; genetic data show that at least 12 others were sick by that point.

But Woolfson knew nothing about her potential exposure until a friend texted her a news article about the outbreak. Suddenly the dry cough and mild ache she'd been feeling seemed serious enough to call her doctor, who immediately sent her to the ER for testing. She was positive.

The Massachusetts Department of Public Health ultimately identified 97 coronavirus cases among meeting attendees and people who lived with them. Every individual linked to the conference whose genome was sequenced - 28 people in total - carried the conference's characteristic mutation. It was dubbed "C2416T" for its location at the 2,416th spot on the genome and the two nucleic acids, cytosine (C) and uracil (T), that got switched.

ADVERTISEMENT

Sequencing also revealed how the coronavirus evolved even as the Biogen conference was going on. About a quarter of attendees were sickened by a virus whose genome contained both the C2416T mutation and a second mutation, G26233T. In one case, the scientists found both versions of the virus replicating in a single set of lungs.

This shows that the G26233T variant is a descendant of the germ that originally arrived at the meeting, Lemieux said, an imperfect clone that wound up giving rise to its own distinct lineage.

The conference, the new study finds, amplified both variants, turning what might have been just one more introduction of the virus into a "superspreading event."

About a month later, more than 600 residents and staff members at two of Boston's biggest homeless shelters were tested as part of a universal screening effort. Officials were shocked to discover that 230 people were already infected with the coronavirus, the large majority of them asymptomatic.

Genetic analysis showed that nearly two-thirds of sequenced infections among shelter residents could be traced back to the conference.

"Our jaws dropped," said Pardis Sabeti, a computational biologist at the Broad Institute and one of the lead researchers on the study. "It was the realization that these events really affect the most vulnerable among us."

Scientists can only speculate as to exactly how the infection among biotech executives made its way into Boston's homeless community. But that's precisely the point of genomic epidemiology, Sabeti said. Genetic data can reveal connections no one thought to look for, helping health officials seek out and sever chains of transmission.

There's another lesson in the data, said James O'Connell, a professor of medicine at Harvard Medical School and the founder and president of the Boston Health Care for the Homeless Program: Packed shelters, like the conference, provide ideal conditions for superspreading.

ADVERTISEMENT

"And by the time we realized how bad it was, so much asymptomatic spread had happened that it was too late," O'Connell said.

The findings match what has been observed on a smaller scale in other studies, said Dave O'Connor, a virologist at the University of Wisconsin at Madison. Superspreading events, which provide the virus with huge numbers of hosts in a small amount of time, are driving the global outbreak. Delays in returning test results make it much more difficult to mitigate their effects; by the time those infected in such events know they're sick, they have probably infected many more people.

"Right now, there are almost certainly people sparking new transmission clusters‚" O'Connor said.

If the United States continues to repeat the mistakes of February, he added, the same patterns of transmission will play out over and over again.

‘Just throwing away the crown jewels’

Of the 5.7 million confirmed coronavirus cases in the United States, scientists have sequenced the genomes for about 19,008, according to the Global Initiative on Sharing All Influenza Data (GISAID), the widely used international genome database. That's roughly 0.33 percent of the nation's epidemic.

Even though most tests work by detecting viral RNA in swabs from patients' airways, those samples are rarely studied further once doctors get a diagnosis.

But if it were up to MacInnis, every coronavirus sample collected in the United States would be sent to a genetics lab for sequencing. Each of those sequences would be analyzed and submitted to the GISAID database. The results would be shared with health officials and contact tracers, deepening their understanding of their local outbreaks.

"If you're spending whatever money large organizations seem to be putting into large-scale testing," MacInnis said, "throwing away that very same extracted [RNA] that could tell you about how cases are connected within your organization or within communities is just throwing away the crown jewels of what you really want to know."

Genetic insights could be "invaluable" for communities balancing the need to control the virus with the desire to reopen, MacInnis said. Suppose four students at an elementary school became sick. If genetic analysis showed they shared a common strain, the virus was most likely transmitted at school, suggesting the facility should close or at least conduct a thorough review of infection-control procedures. But if the infections were genetically unrelated, it's likely they independently contracted the illness elsewhere, in which case the students should stay home but the school could remain open.

"It's not testing that can answer that question," MacInnis said. "It's having genomic data to tell you whether they appear to be connected."

But individual scientists and the National Academies alike say that the United States currently lacks the resources to carry out surveillance in a comprehensive way. Other countries have devoted millions of dollars to sequencing a representative sample of cases, producing a comprehensive picture of their national outbreaks. In the United States, meanwhile, genetic investigations have been led largely by individual institutions or small regional coalitions like the one in Boston. The resulting genetic portrait is patchy - the global GISAID database of SARS-CoV-2 sequences currently contains 1,807 submissions from Michigan and just 13 from Alabama, for instance - and that makes it less useful, scientists say.

Even the Broad Institute, a leader in this kind of work, has been stymied by a shortage of funds. MacInnis said her team has had to stop sequencing entirely while they apply for new grants. The scientists have not been able to collect any samples from the resurgence of cases in Boston.

ADVERTISEMENT

The National Academies' report calls for the Department of Health and Human Services to fund and coordinate widespread genomic surveillance of the coronavirus, as well as build a national infrastructure for recording and analyzing the resulting data.

That such a system has not been implemented already, Lemieux said, "is a failure of the federal government."

Without fast, widespread and coordinated sequencing, he said, scientists can produce only studies like the one out of Boston - “postmortems” on outbreaks that are already over.

ADVERTISEMENT