High-throughput sequencing has transformed transcriptomics, enabling deeper insights into RNA biology. Direct RNA Sequencing (DRS) and complementary DNA (cDNA) sequencing are two key methods used to study gene expression, transcript diversity, and RNA modifications.
Traditional short-read sequencing struggles to capture full-length transcripts and alternative splicing events due to its limited read length. In contrast, long-read sequencing technologies like Oxford Nanopore produce reads spanning thousands of bases, allowing for more precise transcriptome characterization.
This article provides a comprehensive overview of the workflows for Direct RNA Sequencing and cDNA Sequencing, highlighting their methodologies.
Direct RNA Sequencing Workflow
Direct RNA sequencing (DRS) allows you to study RNA molecules in their native state, avoiding the conversion to cDNA. This method provides a more accurate depiction of the transcriptome, enabling the detection of splice variants, gene fusions, and RNA modifications that traditional sequencing methods may miss.
This approach is beneficial for analysing full-length transcripts since it allows for detecting splice variants and gene fusions without amplification bias. In addition, Direct RNA Sequencing (DRS) allows for the detection of key RNA modifications, such as methylation, which play crucial roles in post-transcriptional regulation.
These modifications affect RNA stability, translation efficiency, and gene expression regulation, offering deeper insights into cellular functions and disease mechanisms.
1. RNA Sample Preparation
Accurate RNA sample preparation is essential to ensure high-quality sequencing outcomes. Therefore, for RNA sample preparation, RNA integrity and purity need to be maintained to reduce contamination and degradation. Efficient RNA extraction techniques are required for the recovery of full-length transcripts and RNA modifications that yield informative data on gene expression.
Intact RNA is central to Direct RNA Sequencing (DRS), as it provides full-length transcripts and modifications, reducing bias and improving sequencing accuracy and reproducibility.
A. RNA Extraction
TRIzol extraction or column-based purification methods isolate high-quality RNA from the sample. TRIzol is a standard reagent that allows for phase separation, isolating RNA, DNA, and proteins from a single sample. Following the addition of chloroform, the sample is centrifuged into three phases: the aqueous phase (which contains RNA), the interphase (DNA and proteins), and the organic phase. RNA is recovered from the aqueous phase, precipitated, and purified for further use.
The column-based RNA isolation employs a silica membrane or column to purify RNA. Following cell lysis, the sample is filtered through a column where the RNA is immobilized on the silica under high salt conditions. The column is washed to remove impurities, and the RNA is eluted with an elution buffer.
Maintaining RNA integrity and purity are essential for high-quality sequencing results. Total RNA should technically have:
- RNA Integrity Number (RIN) > 8 to confirm intact transcripts. This is a widely accepted standard for high-quality RNA, especially for RNA sequencing. However, RIN > 8 may vary depending on the protocol or research requirements, so it’s important to confirm the RIN standard for your specific work.
- 28S/18S ribosomal RNA ratio ≥ 1.8-2.0 for mammalian samples.
- Absorbance Ratios: OD260/280 ~2.0 (for protein contamination test).
B. RNA Fragmentation (Optional)
RNA fragmentation is generally not recommended for Direct RNA Sequencing as it contradicts its main advantage of full-length transcript sequencing. However, fragmentation may be performed in specific cases where the goal is to sequence particular RNA regions rather than the entire full-length molecules.
When fragmentation is done, it is usually achieved using RNA fragmentation reagents or metal ion hydrolysis to generate RNA fragments of a controlled size range (~200-500 nt).
C. Adapter Ligation
Here, the poly(T) adapter is ligated onto the RNA molecule with T4 DNA ligase to enable effective capture of the RNA by the nanopore sequencing system. The poly(T) adapter incorporates a sequence recognized by the nanopore system so that it will bind the RNA molecules for reliable sequencing.
The sequencing adapters preloaded with motor proteins are incorporated to enable controlled translocation of RNA across the nanopore, maximizing read accuracy.
2. Sequencing
After RNA is well prepared, Oxford Nanopore RNA Sequencing can directly analyze it to allow accurate sequence generation and insights into gene expression dynamics.
For Oxford Nanopore RNA Sequencing, about 500–700 ng of poly(A) RNA is normally needed. Oligo(dT) beads are used to purify the RNA and select for coding transcripts, specifically by binding to the poly(A) tails of mRNA.
Non-polyadenylated RNAs like histone mRNAs, viral RNAs, and some long non-coding RNAs (lncRNAs) are not selected by this means. Extra methods of enrichment will be needed to sequence and trap these RNA molecules. Let’s understand how Oxford Nanopore Direct RNA Sequencing enhances the process.
Oxford Nanopore Direct RNA Sequencing
- After RNA preparation, the RNA library is introduced into a flow cell with Oxford nanopore sequencing technology. The real-time data generated during sequencing allows you to detect full-length transcripts, providing deeper insights into alternative splicing and RNA modifications.
- RNA molecules are driven through the nanopores, producing real-time electrical signals translated into nucleotide sequences.
- Unlike DNA, RNA translocates through the nanopore in the 3′-5′ direction, opposite to the conventional 5′-3′ sequencing direction used in DNA sequencing. Base-calling software automatically corrects the orientation for accurate sequence representation.
- Oxford nanopore sequencing detects full-length transcripts, giving insights into alternative splicing, poly(A) tail length, and post-transcriptional modifications.
3. Data Analysis and Interpretation
Once sequencing is complete, the next crucial step is data analysis. This phase involves interpreting the data generated during sequencing to gain insights into gene expression, isoform identification, RNA modifications, and transcript structure. These data points are processed and interpreted, particularly through base calling and alignment, before diving into specific modifications.
A. Base Calling
The unprocessed data that is obtained from the nanopore flow cell is converted into nucleotide sequences. Electrical signals are translated into base calls by base calling software and then corrected for accuracy using machine learning and AI-based tools. This process is important to guarantee that the sequences are correctly identified and mapped.
B. Read Alignment
Sequencing reads are aligned to a reference transcriptome or genome with the aid of long-read mappers, such as Minimap2. Alignment contextualizes sequence data so that researchers can pinpoint genes, quantify expression levels, and highlight areas of interest. Quality metrics, such as read length distribution and error rates, are checked to ensure that mapping is correct.
C. Isoform Detection & RNA Modifications
Direct RNA sequencing has the advantage of identifying full-length transcript isoforms as well as alternative splicing events. By viewing the alignment and sequencing data, different isoforms of a gene can be discovered, providing gene regulation insight. Deviations from the nanopore signal also identify RNA modifications like m6A, m5C, and pseudouridine. These changes are responsible for RNA stability, translation efficiency, and regulation of gene expression.
D. Quantitative Transcriptome Analysis
The gene expression is quantified at a single-transcript level. It offers a more nuanced view of gene regulation, differential expression, and how modifications could influence gene activity. By examining expression levels between different conditions, scientists can identify important regulatory patterns and gene expression changes.
A study on infection highlighted the value of Direct RNA Sequencing, delivering unique insights into the pathogen’s behavior and host interactions.
While Direct RNA Sequencing provides a native view of RNA molecules, cDNA Sequencing offers an alternative approach with different advantages. Below, we outline the key steps in each sequencing workflow.
cDNA Sequencing Workflow
cDNA sequencing has long been a core tool in transcriptomics, enabling gene expression analysis. However, unlike Direct RNA sequencing, it is subject to reverse transcriptional biases, which can lead to coverage gaps in highly structured RNAs or long transcripts.
cDNA sequencing method is widely applied because of its compatibility with high-throughput sequencing platforms. cDNA sequencing is a versatile tool capable of measuring gene expression, discovering new transcripts, and characterizing splice variants. Given that it provides massive amounts of data from just a single sample, cDNA sequencing is preferable for most investigators.
1. RNA Isolation
Biological samples are isolated with high-quality RNA using techniques such as TRIzol extraction or column-based purification. These methods effectively isolate RNA from DNA and proteins. The integrity of the RNA is verified using techniques such as spectrophotometry (e.g., NanoDrop) and electrophoresis (e.g., Bioanalyzer), providing a high-quality sample for sequencing.
The RNA’s RIN (RNA Integrity Number) must be over 8, and the 28S/18S ribosomal RNA ratio must be ideally 1.8-2.0 for mammalian samples.
2. cDNA Synthesis
The RNA is reverse transcribed to complementary DNA (cDNA) by the action of reverse transcriptase enzymes, including M-MLV or Superscript III. The reaction is initiated with oligo(dT) primers (to choose mRNA) or random primers (for total RNA).
Strand-switching protocols can be employed in certain protocols to produce full-length cDNA, which is useful in the detection of splice variants and gene fusions.
3. Library Preparation
A. Adapter Ligation
Following cDNA synthesis, a poly(T) adapter is ligated to the cDNA molecules with T4 DNA ligase. The adapter is necessary to enable the cDNA molecules to engage with the nanopore sequencing system. The poly(T) adapter enables controlled translocation of RNA or cDNA molecules through the nanopore for sequencing.
B. Poly(A) Enrichment
Normally, poly(A)+ RNA (mRNA) is isolated for sequencing by capturing the poly(A) tail of the RNA with oligo(dT) beads. This process enriches coding transcripts (mRNA), but further enrichment can be required for those RNAs not polyadenylated, e.g., histone mRNAs, viral RNAs, or certain lncRNAs (long non-coding RNAs).
C. Fragmentation
While fragmentation is not typically required for Oxford Nanopore sequencing in the context of cDNA, it can be used in specific cases where smaller, targeted RNA regions need to be sequenced.
3. Sequencing
The ready-to-use cDNA library is added to the Oxford Nanopore sequencing flow cell. Nanopore sequencing operates by passing the cDNA molecules through a nanopore within a membrane. When the molecules go through the pore, real-time electrical signals are produced, which are translated into nucleotide sequences.
Unlike most sequencing technologies, Oxford Nanopore sequencing can detect full-length cDNA in a single pass without fragmenting the RNA first. This is a plus when it comes to detecting complicated RNA species and gauging the complete spectrum of alternative splicing events and gene expression patterns.
Real-time sequencing provides direct information on RNA modification, splice variants, and the cDNA sequence as a whole.
4. Data Analysis
Once sequenced, the information is analyzed through base calling, wherein the electrical currents generated by the nanopores are decoded into nucleotide sequences.
The reads are subsequently mapped onto a reference transcriptome using long-read mappers such as Minimap2. This mapping forms the foundation for precise detection of gene expression, identification of isoforms, and alternative splicing events.
Quantification of gene expression is conducted with the help of software such as DESeq2 or edgeR. These programs work on the basis of the number of reads per gene, thereby generating a gene activity profile.
Oxford Nanopore sequencing can also identify modifications in RNA. The modifications are deduced from nanopore signal deviations, and dedicated tools such as Tombo or Nanocompore assist in identifying these changes without further chemical labeling.
Note: RNA integrity and purity are essential for effective direct RNA and cDNA sequencing. Biostate AI makes this easy with a comprehensive, budget-friendly end-to-end solution that takes care of RNA extraction, library preparation, and sequencing—delivering high-quality output without breaking research budgets.
Limitations of Direct RNA and cDNA Sequencing Technologies
While both Direct RNA Sequencing (DRS) and cDNA Sequencing are valuable tools for transcriptome analysis, they come with certain limitations that researchers must consider:
- Error Rate: One of the primary challenges with Nanopore RNA sequencing is its relatively higher error rate compared to other sequencing technologies, especially in base calling. Although advancements in data processing software are continuously improving accuracy, sequencing errors can still impact the precision of identifying single nucleotides.
- Impact of RNA Degradation: The read length may be affected by RNA degradation or sample quality, which can lead to gaps in the data or fewer long reads in complex samples.
- Signal Noise: This is also a factor that may complicate the analysis, particularly when dealing with low-abundance RNA species.
- Reverse Transcription Bias: Reverse transcription introduces biases in the sequencing process, as some RNA regions may not be efficiently converted into cDNA, especially in highly structured or long transcripts.
- Amplification Bias: The PCR amplification step can also preferentially amplify abundant transcripts, potentially leading to underrepresentation of less abundant ones. Additionally, cDNA sequencing cannot detect RNA modifications in their native state, unlike Direct RNA Sequencing.
- General Considerations: Both sequencing methods depend heavily on RNA quality. Degraded RNA or low-quality samples can lead to incomplete or biased data. Moreover, the preparation steps for both methods are sensitive to contamination, which can introduce errors into the results. These limitations should be kept in mind when selecting the appropriate method for a specific research goal.
Conclusion
Direct RNA sequencing and cDNA sequencing have fundamentally transformed transcriptomics by enabling a more comprehensive analysis of RNA molecules. Direct RNA sequencing eliminates the need for reverse transcription, allowing for the detection of full-length transcripts, RNA modifications, and isoforms in their natural state, providing unmatched resolution.
On the other hand, cDNA sequencing, despite its biases from reverse transcription, continues to play a vital role in quantifying gene expression and discovering new transcripts.
As sequencing technologies advance, particularly with Oxford Nanopore RNA sequencing, the exploration of gene regulation, disease mechanisms, and therapeutic strategies is becoming more precise than ever.
If you’re looking to explore RNA sequencing for your research, Biostate AI offers sequencing solutions that deliver comprehensive data for your studies, from full-length transcript detection to RNA modification analysis.
Disclaimer
This article is intended for informational purposes and is not intended as medical advice. Any applications in clinical settings should be explored in collaboration with appropriate healthcare professionals.
Frequently Asked Questions
1. How does nanopore sequencing work?
Nanopore sequencing is a type of single-molecule sequencing technology that detects changes in an electric current as nucleic acids (DNA or RNA) pass through a tiny protein pore (nanopore). The process works as follows:
- DNA or RNA molecules are captured by the nanopore, and as the molecule passes through the pore, it disrupts an electric current that is being applied across the pore.
- Each nucleotide (A, T, C, G for DNA or A, U, C, G for RNA) causes a characteristic change in the current, and this change is recorded by a detector.
- The system translates these current changes into sequence data by comparing the disruptions to known patterns for each nucleotide.
- The real-time reading capability allows for fast analysis, and it doesn’t require amplification or chemical labeling.
Nanopore sequencing is widely known for its ability to sequence long reads (thousands of bases long), which is a key advantage over short-read sequencing technologies.
2. What is the direction of nanopore sequencing RNA?
In nanopore sequencing of RNA, the RNA molecule is typically translocated through the nanopore in the 3’ to 5’ direction. Unlike DNA sequencing, which typically uses the 5’ to 3’ direction, RNA molecules are pulled through the nanopore in the reverse direction for sequencing.
The directionality is important because it impacts how the software interprets the signal data and reconstructs the sequence. Once the RNA passes through the nanopore in the 3′ to 5′ direction, the sequencing software corrects the orientation to reflect the natural 5′ to 3′ direction of transcription.
3. What are the problems with nanopore sequencing?
Despite its many advantages, nanopore sequencing has some limitations:
- Error rates: Nanopore sequencing is known to have higher error rates compared to other sequencing technologies, particularly for base calling. These errors can affect the accuracy of single nucleotide identification. However, newer software tools and algorithms are being developed to improve the accuracy of nanopore sequencing.
- Read length limitations: While nanopore sequencing excels in generating long reads, the overall read length may still be limited by factors like RNA degradation, sample quality, and sequencing technology. It can sometimes result in gaps in data or fewer long reads, especially in complex samples.
- Signal noise and resolution: Nanopore sequencing depends on detecting changes in electrical currents as the molecule passes through the nanopore. Signal noise and fluctuations in the data can complicate the analysis, especially when sequencing complex or low-abundance RNA.
However, these challenges are continually being addressed through improvements in hardware, software, and data processing pipelines, which continue to make nanopore sequencing a promising technology in genomics and transcriptomics.