NGS & Clinical Diagnostics | Whitepapers & Reports

Long and Short Read Sequencing: Revolutions in Genomics Data  

In this Discussion Group Report, we investigate some recent applications of long- and short-read sequencing technologies. Continued improvements and advancements in the field mean sequencing errors are increasingly a thing of the past, while the portability of some new units makes them ideal for work in the field.

Many of the opportunities and sequencing techniques available to researchers today feel like products of science fiction, but all of the latest methods discussed in this NextGen Omics Discussion Group are very much science fact. New developments in efficiency and portability make it cheaper than ever to sequence a genome partially or fully, and the speed at which this can be achieved is constantly increasing.

Discussion was led by Patrick Descombes, Senior Expert in Genomics and Genomics Group Leader at Nestle Research, with Edward Oakeley, Associate Director of Genomics at Novartis and Massimo Delledonne, Professor of Genetics at the University of Verona there to co-lead. Read on to learn about some of the changes that have taken place in the field of long- and short-read sequencing since the turn of the millennium.

Advances in Genomics Technologies

“I think we all know there’s been a fantastic revolution in genomics technologies over the past 15-20 years,” Patrick Descombes said as he opened the discussion. Advances in the field are consistently hitting new milestones, with tremendous increases in throughput and reductions in cost. “Some companies are claiming they can go down to less than 100 dollars per genome, which of course is just fantastic. And we know that these types of technologies can be applied to any species with DNA, be they humans, bacteria, or plants.”

According to Descombes, this has led to a ‘fantastic development’ of very large-scale studies, driven by the development of these methods and a reduction in cost. “Still, genomics is a field which is quite complex, and mastering these technologies and their application requires some expertise.”  

"Genomics is a field which is quite complex, and mastering these technologies and their application requires some expertise."

When it comes to specific technological developments, long-read sequencing technologies are particularly useful for resolving complex genomic regions. Synthetic long-reads use ‘tricks’ to assemble long-read proteins from short-read proteins from short-read segments. “When people ask ‘what should I do’, clearly the choice of the approach and technology is driven by the goal of the study,” said Descombes. As such, a major focus for researchers is on the complementarity of long- and short-read sequencing technologies and applications. 

Portable Short-Reads 

After Descombes had introduced the range of topics for discussion, Massimo Delledonne kicked off the conversation by recounting a recent field expedition to the Mongolian Gobi Desert. He explained that his team had been using Oxford Nanopore Data Reads for the project due to their high portability. “We were keeping material frozen during the night by using the Oxford Nanopore box container, which was fantastic, and during the day we were using a portable freezer to keep the sample frozen.” 

Descombes asked Delledonne what the benefits were to doing sequencing work on-site as opposed to collecting the samples in the field and working on them in the laboratory. “If you think about Mongolia, they have no sequencer,” Delledonne replied. “In a hotspot for biodiversity, they can analyse the sample there and it’s a very limited cost, because Oxford Nanopore is very cheap.” Delledonne added that many species in Mongolia were on the red list and at risk of disappearing, which made the ability to monitor them in situ very valuable.  

Obstacles to Future Sequencing in Genomics

Hywel Williams, Senior Lecturer in Bioinformatics at Cardiff University, explained that he worked on ways of looking at rare diseases and identifying causative variants in patients who are diagnostically negative. “One of the things I’m interested in with long-read sequencing is whether it can identify pathogenic variants in these kinds of patients.” 

Williams continued that in terms of barriers to developing this understanding further, expense and irreplicable historical samples are two prime issues. “Third, when you find something, how can you understand whether that is potentially pathogenic, or whether it’s just a rare variant in the genome that we haven’t sequenced enough people to understand?” He made the case for a ‘gnomAD’: a Genome Aggregation Database for long-read structural variants to improve the genomic resolution.  

Descombes added that the field was, broadly speaking, still in its infancy. “In order to understand the consequence of having phased variants as opposed to unphased as most of the field has been evolving with this artificial notion of the human haploid genome with short-read sequencing, so it’s only now starting to go into this. A lot of the new development will have to come from sequencing, not only family tree yields, but also larger cohorts with long reads.” 

Potential Applications of the Pan-Genome 

Discussion then turned to the concept of the pan-genome: a family tree of genomes for sequencing, with a graph based on having all the relevant genomes and splits represented. The goal of developing the pan-genome is to simplify the ways in which analysis can be applied and executed. “When you want to map your own genome on a graph of the pan-genome, you should have a high-quality genome,” said Delledonne. “You cannot simply use short-reads to build up a graph pangenome because they don’t know where to be inserted or added to the pan-genome when they don't map on the graph.” 

Williams mentioned that he had worked extensively with the UK’s 100,000 Genomes Project, and that the next stage would involve the sequencing of roughly 2,000 people with PacBio. “In terms of strategy, that is probably the optimum way to go forward at the moment. At this level, a couple of thousand genomes is going to be the minimum amount we need, but if you can do that first couple of thousand and potentially get a handful of diagnoses thrown into that as well then that is going to be a really good use of that money.” 

“The 100,000 Genomes Project in the UK is fantastic,” Delledonne agreed. “It’s a great example of what you can do now when you mix the genomics genetic data with the phenotype. So nowadays, genotyping is as important as phenotyping. At the end, you will have a lot of high quality genomics data, but don’t forget you need to do something with this data. You need to associate them to phenotypes.” 

Overall, we were incredibly satisfied with the conversation facilitated here, and we’re looking forward to hosting more soon. If you’d like to learn more about the latest conversations happening in the field of genomics and genetic sequencing, visit our NextGen Omics portal. Register now for our upcoming Spatial Biology US: In-Person event to keep up to date on the newest advances in research.