Sequencing the World

It looks like the beginnings of a consortium are taking shape, with the goal of sequencing all life on earth. As something of a genomicist, I am psyched by the goal, unattainable as it may be. I also want to say why lofty goals are helpful, and this one will be too.

The Human Genome Project took years to finish, and ended up costing about a dollar per base-pair, which are the chemical “letters” that make up the genetic code. Since then, sequencing has become orders of magnitude cheaper. The current genome sequencing leader, Illumina, famously announced that sequencing a genome could be done for a thousand dollars. If we compare that to the investment required for the human sequence, we certainly have made strides. This is due  to the technology we use to sequence genomes. The most popular way to do it today is to take a sample of DNA from an organism, which is typically present in long stretches of DNA called chromosomes, and break it into short fragments. Since we have a lot of DNA in the sample, we end up having more than one copy of each letter of the genome. Using the powerful genome sequences that we have developed , we can sequence a little bit of each of these fragments before using a computer program to take the short reads and assemble them into a contiguous sequence. If you can imagine taking a few hundred copies of “Moby Dick” and randomly cutting out stretches of letters before trying to reassemble the book from the fragments by looking for overlap between random fragments, then you understand the basic strategy that genome sequencing uses today.

In spite of the cutting edge technology, it still takes a ton of work to go from a draft genome assembly–which is what you could immediately get after putting a thousand dollars into an Illumina machine and plugging the resulting reads into the computer to assemble–to the kind of gold-standard genome assemblies that we have in well-studied organisms like mice and humans. Typically, more work has to be put in to fill in gaps in the assembly that result from highly repetitive DNA, which confounds assemblers. Scientists sometimes have to do follow-up experiments to prove that their genome assembly is real and is not just a computer error. Finally, the genome sequence is useless until you start to figure out where the genes and other features lie. This means more follow-up experiments and comparing the genome to those of other related organisms.

All of this take a significant investment of time and treasure, and there is no way that we could do that for all life on earth. You would never be able to have a gold-standard genome assembly for every organism on earth. Much like the oft-told anecdote about restaurants in New York City–where it is said that you could never eat at every restaurant in the city because new ones are opening for business and going out of business faster than you could visit them all–new organisms are evolving and going extinct all of the time. The idea of putting in enough work to get something as polished as the fruit fly genome, let alone the mouse or human genome, is laughable if you start to think about it. But it would allow researchers to gain an appreciation for the diversity of life that exists on earth, specifically at the DNA level. Just having fractions of the genomes of most of the species on earth would allow us to better understand the evolutionary relationships between all life on earth.

As for this goal being a little too big to handle, big goals are important to push us to new heights. Getting to the moon seemed ridiculous at the time, and sequencing the human genome was impossible when we first started to plan how to do it. These goals ended up being attainable, but just imagine if they had not been. Even if we had never made it to the moon, we would have still developed the kind of technology that allowed us to put satellites into orbit that now power our ubiquitous mobile devices. Even if the human genome proved intractable, we would have still ended up with improved sequencing technology. This is because setting these lofty goals has the effect of pushing us to achieve things that we would have never thought to accomplish without a lofty goal. If we set out sequence all life on earth, just imagine what we might find we can do along the way.

*I found a post by professor/blogger Jeff Ollerton who also had his own take on the proposal. While he and I do not agree, he has an interesting take that I enjoyed reading. It should also be said that he has more expertise than me in this area.

Advertisement

Anti-CRISPRs Could Fine-Tune Genome Editing

Everything needs an off switch. I would have been bankrupt a long, long time ago if I could not turn off the lights in my apartment and C-3PO would have quickly worn out his welcome if he could not shut himself down like he did in Ben Kenobi’s hut. The important thing to remember here is that these things are useful most of the time: light helps me to see but it would not do me any good in the daytime, and C-3PO is like a sassy Google Translate…sometimes too sassy though. And it turns out that even the genome editor CRISPR-Cas9 has an off switch.

Maybe this is the first biology piece you have read in the last three years. If so, you may not know about CRISPR-Cas9 and the genome editing revolution. Commonly referred to as simply “CRISPR” in the popular press, CRISPR-Cas9 is a laboratory method for editing the DNA sequence in a living organism. Throughout the last several years, CRISPR-Cas9 has shown itself time and time again to be a simple and effective way of changing the genome of many different organisms. One group even pursued a controversial study that edited non-viable human embryos, showing that the method can likely be used to edit viable human embryos–as well as setting off a firestorm in the popular press and a lot of ethical hand-wringing within the biomedical community.

The CRISPR-Cas9 system was originally discovered in bacteria, and it functions a kind of anti-viral immune system in bacteria. As I have written before, viruses do their job by injecting a genetic material–DNA in some cases–into a host cell. Some viruses specifically target bacteria. Much like our bodies have evolved defenses against pathogens, bacteria have evolved defenses against viral invaders. This is where CRISPR-Cas9 comes in. Scientists–at a yogurt company of all places–discovered pieces of viral DNA in the genome of a bacterial species that is normally used in yogurt-making. Interestingly, bacteria with these viral signatures were also immune to the corresponding virus. Later work showed that these stretches of viral DNA were actually added to the bacteria’s genome after a viral infection. After that initial infection, the new viral DNA pieces in the bacterium could be made into RNA and loaded onto the protein Cas9. The RNA-Cas9 complex is then free to go bind to DNA that is specified by the RNA, which would be viral DNA in this example. After seeking out complementary DNA from an invading virus, Cas9 performs its molecular function: cutting that DNA into pieces that cannot take over the host cell.

Research on CRISPR-Cas9 has been moving forward at a rapid pace, so I could write exclusively about it and never run out of things to talk about. But a recent published result showed that some bacterial viruses have evolved special proteins to inactivate Cas9, effectively shutting down the CRISPR-Cas9 immune system. It has been known since the middle of the 20th century that protein activity can be controlled by the binding of another molecule. The phenomenon is broadly known as protein regulation, and it is useful because a cell often needs to fine-tune the activity of certain proteins in order to survive. For example, Escherichia coli bacteria prefer to use glucose sugar for energy, but they also can also produce an enzyme to utilize another sugar, lactose, for energy. Interestingly, a lactose molecule can bind to the protein that prevents the production of the lactose-digesting enzyme and allow for the utilization of lactose. Similarly to how lactose can control the protein that shuts down lactose metabolism, scientists recently discovered that a group of viral proteins can shut down Cas9. Importantly, they showed that the “anti-CRISPRs,” as they dubbed the molecules, can bind to the RNA-Cas9 complex and strongly inhibit the DNA-cutting activity of Cas9 in a test tube.

However, the real appeal of CRISPR-Cas9 is not that we can mix it with DNA in a test tube and see DNA cleavage. Instead, we can do all of this in a living cell and cause DNA mutations that can be useful for research or maybe even therapy. If we are going to continue using CRISPR-Cas9 in living cells–perhaps someday therapeutically–we are going to want to fine-tune its activity. Luckily, these same researchers showed that anti-CRISPRs can block CRISPR-Cas9 genome editing in human cells. This result could someday help to avoid “off-target effects” that CRISPR-Cas9 sometimes causes, which are basically just unintended editing effects that could cause more harm than good.

A Voyage of Viral Discovery

Richard Dawkins’ Selfish Gene came out 40 years ago, so it is only fitting that I get to write about the most selfish genes of all: viruses. Basically, viruses are pieces of genetic material–either DNA or RNA–surrounded by a protein shell and maybe some lipid membrane. Viruses are not living cells, and they do not fulfill most of the hallmarks of life that many of us learned in middle school: viruses do not catalyze their own chemical
reactions, they are not made up of cells, and they do not reproduce on their own. In order to do the chemical reactions necessary to reproduce and make more copies of themselves, viruses must find a way to put that genetic material that they carry into a living host cell and trick the host into using the code as it would use its own genome. This is how the virus manages to make the host into a veritable virus factory.

Since viruses rely on living cells for almost everything, it has not been easy to study them. In fact, we did not even know that viruses existed until the late 19th century. The first viruses were isolated when scientists studying a pathogen found that they could run infectious material through the smallest available filters without removing the infectious factor. At that point, they just called them “non-filterable agents” and reasoned that they must be extremely small, even smaller than bacteria. Experiments by others in the early and mid-20th century went on to discover that viruses were mostly protein and nucleic acid (RNA or DNA), making them radically different from previously known cellular life.

As biologists, we were pretty late to the virus party–shoot, we pretty much knew what cells were shortly after the first microscopes were built in the 1600s, but it somehow took until the 1800s to know that there was something smaller that could cause disease–so it is no surprise that there is still a lot for us to learn about the tiny “non-filterable agents.” Appropriately, a recent paper in Nature claimed to find over 1000 distinct viruses that are all new to science. To make this discovery, the scientists first had to pick a group of cellular hosts in which to look for viruses. They settled on invertebrates, a diverse group of animals that include everything from insects and squids to sea urchins and earthworms. They also had to decide what type of viruses they would look for, opting to search for RNA viruses, which invade a host using RNA instead of DNA as their genetic material. By collecting and sequencing RNA from over 200 different invertebrate species, they were able to piece together long strands of RNA using the sequencing data and a computer program. However, those long reconstructed strands of RNA did not necessarily come from a virus present within the host. Host cells make their own RNA all of the time using their own DNA as a template. In order to be sure that the piece of RNA they found originated in a virus, they needed a signature that could only be present in a viral RNA. They found that signature in the form of a RNA virus-specific gene called “RNA-dependent RNA polyermase” or RdRp. RNA viruses use RdRp to copy their RNA genome when they invade a host cell, but they have to bring their own as part of their RNA genome; animals just do not have an RdRp. (That is, unless you believe this group that claims to have found a possibly-functional RdRp gene in a bat genome. I hope you will agree with me when I say that living things tend to be amazing because all of the rules we have about them are inevitably broken in some other organism.)

With this handy tool to distinguish viral RNAs from the rest of the pool, the authors had a field day discovering new RNA viruses. In addition to classifying viruses based on the host they were discovered within, they also used a technique known as “phylogenetics” to compare the RNA sequence of all viruses in order to place them on a tree of life relative to each other. Since all life on earth can ultimately trace its root back to one common ancestor that is the evolutionary relative to all of us, from human to bacterium, we can compare the nucleic acid sequences of organisms or viruses in order to infer their evolutionary distance from each other. For example, two viruses with relatively similar RdRp genes would be inferred to be quite closely related compared to a third virus with less sequence in common in the RdRp gene.

These new viruses were not discovered as human pathogens, so it is unlikely that this finding will have any direct medical relevance. This result can instead be useful for ecologists and evolutionary biologists who want to understand the variety of viruses that infect the invertebrates studied. Moreover, since we know quite a lot about the evolutionary relationships between different invertebrates–owing to us having studied them quite intensely for decades or even centuries–we can now use the new phylogenetic information about viral genome relatedness to start to ask questions about how the viruses co-evolved with their hosts. For instance, a group of related beetles may tend to be infected with related RNA viruses. If this is the case, then it is possible that an early ancestor of those RNA viruses made a living infecting an early ancestor of those beetles. Basic studies like that might also help us to someday understand host-virus co-evolution in humans and our viruses. After all, humans are in no danger of hitting an evolutionary brick wall, and neither are our viral foes.