Total RNA sequencing (total RNA-seq) is a powerful method that enables comprehensive profiling of both coding and non-coding RNA species in a biological sample. Unlike traditional mRNA-seq, which focuses exclusively on polyadenylated messenger RNAs (mRNAs), total RNA-seq captures the full transcriptomic landscape.
Total RNA-seq offers a more inclusive view of transcriptional activity across coding and non-coding regions. This makes it especially valuable in studies where regulatory RNAs play a central role or where alternative splicing and transcript variants are of particular interest.
This guide walks through each stage of the total RNA-seq data analysis pipeline from sample preparation and library construction to advanced data interpretation.
Key Takeaways
- Captures all RNA species, including non-polyadenylated RNAs like lncRNAs, circRNAs, and pre-mRNAs unlike traditional mRNA-seq.
- Requires rRNA depletion, not poly(A) selection, for unbiased transcriptome profiling.
- Enables splicing analysis, fusion detection, and non-coding RNA discovery with tools like STAR, DESeq2, rMATS, and Salmon.
- Faces challenges like multimapping reads, incomplete annotations, and rRNA contamination, demanding careful QC and alignment.
What is Total RNA-Sequencing?
Total RNA-sequencing (total RNA-seq) provides a comprehensive snapshot of all RNA molecules present in a biological sample. Unlike mRNA-seq, which captures only polyadenylated transcripts, total RNA-seq uses rRNA depletion to retain broader RNA classes.
This includes coding mRNAs and non-coding RNAs like lncRNAs, circRNAs, snoRNAs, snRNAs, and precursor microRNAs (pre-miRNAs). Total RNA-seq also detects transcripts typically excluded in small RNA-seq, such as RNAs longer than 200 nucleotides. In contrast, small RNA-seq targets only short RNAs like miRNAs and piRNAs, missing longer regulatory RNA species.
Total RNA-seq provides an unbiased capture of the transcriptome by removing abundant rRNAs (which make up ~80% of total RNA) while preserving the complexity of all other RNA types; polyadenylated or not, long or short.
Why Total RNA-Seq Matters in Research?
Total RNA-seq has become the preferred method in studies where transcript diversity plays a central role. Its ability to capture the full range of RNA molecules enables researchers to investigate:
- Alternative splicing and transcript isoform usage across cell states or disease conditions.
- Regulatory roles of non-coding RNAs, especially lncRNAs and antisense transcripts, in transcriptional and epigenetic control.
- Non-polyadenylated RNAs important in chromatin organization, transcriptional interference, and RNA-based gene silencing.
- Circular RNAs and pre-mRNAs, often overlooked in poly(A)-based sequencing but increasingly recognized for their functional roles.
- Host-pathogen interactions, where certain viral RNAs and host responses involve non-polyadenylated transcripts.
Biostate AI’s RNA-seq platform helps uncover these complex layers of gene regulation with unmatched sensitivity and minimal input requirements, ideal for both exploratory and hypothesis-driven research.
Applications Across Disciplines

- Cancer biology: Mapping lncRNA expression, fusion transcripts, and splicing variants linked to tumor progression.
- Neurodegeneration: Investigating RNA processing defects, retained introns, and non-coding RNA dysregulation in disorders like ALS and Alzheimer’s.
- Infectious disease: Capturing viral RNAs (including non-polyadenylated genomes), host lncRNA response, and interferon-stimulated transcripts.
- Stem cell and developmental biology: Profiling early-stage or lineage-specific RNAs before polyadenylation and splicing are complete.
Advantages of Total RNA-Sequencing
- Unbiased transcriptome coverage, regardless of polyadenylation status.
- Greater regulatory insight, especially from lncRNAs, intronic reads, and pre-mRNA quantification.
- Better data for gene model refinement, isoform discovery, and non-coding RNA annotation.
By capturing the full RNA landscape, total RNA-seq delivers the resolution and depth needed to dissect complex transcriptional programs that drive health, disease, and development, making it an essential tool for modern transcriptomics.
Total RNA-seq delivers powerful insights, but extracting meaningful data requires a carefully structured approach. The following workflow outlines each critical step from RNA extraction to advanced interpretation ensuring high-quality, reproducible results across diverse sample types.
Total RNA-Seq Workflow Overview

Total RNA sequencing follows a systematic pipeline designed to retain all RNA species, including coding and non-coding elements. Each stage influences data quality and interpretability, making consistency and precision essential for accurate total RNA sequencing data analysis.
1. Sample Collection & RNA Extraction
It begins with high-quality biological material, such as fresh-frozen tissues, blood, cultured cells, or FFPE samples. RNA is immediately stabilized using RNase inhibitors or commercial preservation buffers to prevent degradation.
Total RNA is then extracted using either silica column-based kits or phenol-chloroform protocols. Following extraction, RNA concentration and purity are measured by spectrophotometry, evaluating A260/A280 and A260/A230 ratios. Finally, RNA integrity is assessed using a Bioanalyzer or TapeStation to ensure suitability for downstream applications.
2. rRNA Depletion or Selection
Ribosomal RNA accounts for the majority of total RNA, so depleting it is essential. For total RNA-seq, rRNA is removed rather than selecting poly(A) tails. This is usually done using:
- Probe-based depletion (e.g., Ribo-Zero, NEBNext): hybridizes and removes rRNA.
- Enzymatic methods: degrade rRNA post-extraction.
This step retains both polyadenylated and non-polyadenylated RNAs, such as lncRNAs, circRNAs, snoRNAs, and pre-mRNAs.
| RNA Type | Length | Function |
| mRNA | >200 nt | Protein coding |
| lncRNA | >200 nt | Regulatory, epigenetic scaffolding |
| circRNA | Variable | miRNA sponges, transcriptional regulation |
| snoRNA/snRNA | <200 nt | rRNA modification, splicing machinery |
| pre-miRNA | ~70 nt | Precursors to mature miRNAs |
| mature miRNA | ~22 nt | Post-transcriptional gene silencing |
3. Library Preparation
After RNA fragmentation, the resulting fragments undergo reverse transcription to generate complementary DNA (cDNA). This is followed by adapter ligation and PCR amplification, producing a sequencing-ready library. In many protocols, strand-specific library preparation is employed, preserving the original directionality of transcripts.
This strand information is crucial for accurately identifying antisense RNAs and resolving overlapping genes within the transcriptome.
Key considerations:
- Use random primers for unbiased transcript coverage
- Minimize PCR cycles to reduce amplification bias
- Perform size selection to enrich for desired insert lengths (~200–400 bp)
4. Sequencing
Prepared libraries are sequenced on high-throughput platforms, most commonly Illumina. Paired-end reads, such as 2×100 bp, enhance splice junction detection and isoform resolution.
Platform selection depends on:
- Read depth (20–100M reads/sample for most studies)
- Study goal (discovery vs. targeted validation)
- Budget and sample quality
5. Data Processing & Quality Control
Raw reads must be cleaned and assessed before alignment.
Standard steps:
- Adapter trimming (e.g., with Cutadapt or Trimmomatic)
- Quality filtering to remove low-quality reads or contaminants
- Assessment of base quality, GC content, and duplication rates using tools like FastQC or MultiQC
QC at this stage prevents downstream artifacts and saves time during analysis.
6. Alignment & Quantification
Cleaned reads are aligned to a reference genome or transcriptome using splice-aware tools like STAR or HISAT2, which accurately map exon–intron junctions and complex transcripts.
After alignment:
- Generate read counts using featureCounts or HTSeq
- Quantify transcript abundance using TPM, FPKM, or raw counts
- For novel transcript discovery, tools like StringTie or Cufflinks can reconstruct isoforms
7. Differential Expression & Functional Analysis
Normalized counts are analyzed across conditions using statistical tools like DESeq2, edgeR, or limma-voom to identify genes or transcripts with significant expression changes.
Follow-up includes:
- Functional enrichment analysis (GO terms, KEGG pathways)
- lncRNA function prediction
- Isoform-level expression patterns in alternative splicing studies
8. Visualization & Interpretation
Data interpretation relies on intuitive and biologically relevant visualizations.
Recommended tools:
- Volcano plots for DEGs
- Heatmaps and PCA plots for clustering
- Genome browser views (e.g., IGV) to explore read coverage across loci
- Network analysis to explore regulatory interactions, especially involving ncRNAs
This workflow enables researchers to extract full transcriptomic insights from complex or degraded samples. For optimized execution from sample to insight, companies like Biostate AI offer end-to-end solutions, including low-input compatibility, AI-driven pipelines, and comprehensive support for both coding and non-coding RNA analysis.
After generating high-quality sequencing data, the next crucial phase is data analysis. This section breaks down the key steps to process, quantify, and interpret total RNA-seq results effectively.
Data Analysis Pipeline for Total RNA-Seq

Total RNA sequencing introduces a complex landscape of transcripts—spliced, unspliced, coding, non-coding, and often degraded. Effective total RNA sequencing data analysis demands robust preprocessing, context-aware quantification, and methodical statistical interpretation to extract meaningful biological insights.
Here’s a step-by-step breakdown:
1. Raw Read QC & Preprocessing
Total RNA-seq reads typically contain remnants of adapter sequences, low-quality bases, and occasionally rRNA contamination.
Key tools:
- FastQC for per-base quality, GC content, and duplication rate visualization
- fastp or Trim Galore for automated trimming and filtering
Steps:
- Adapter trimming: Removes 3′ and 5′ adapters, especially crucial for degraded RNA or FFPE-derived samples.
- Quality filtering: Discard reads with mean Q-scores below 20–30.
- Contaminant removal: Remove residual rRNA reads using SortMeRNA or BBDuk when rRNA depletion is incomplete.
2. Alignment & Mapping
Unlike poly(A)-selected libraries, total RNA-seq data includes unspliced pre-mRNAs, lncRNAs, and overlapping antisense RNAs necessitating splice-aware aligners with careful tuning.
Recommended aligners:
- STAR: Fast, highly accurate, supports two-pass mapping and chimeric alignment
- HISAT2: Lightweight, effective for degraded samples and known SNP-aware alignment
Challenges to address:
- Non-coding RNA mapping: Many ncRNAs overlap with coding genes or originate from repetitive regions. Enable soft clipping and multi-mapping.
- Multimapping reads: lncRNAs, snoRNAs, and pseudogenes often have multiple genomic loci. Assign with caution or filter out based on alignment scores.
Best practices:
- Use comprehensive annotations (GENCODE > v35 or Ensembl) with both coding and non-coding features.
- Activate –quantMode TranscriptomeSAM in STAR if quantification tools like RSEM or Salmon are downstream.
3. Quantification of RNA Species
Total RNA-seq quantifies a wider array of RNA species beyond traditional mRNAs. Using the right quantification strategy ensures accurate expression estimates.
Popular tools:
- HTSeq / featureCounts for gene-level counts
- Salmon / Kallisto for transcript-level quantification (alignment-free or quasi-mapping)
Considerations:
- Enable stranded mode to accurately quantify antisense and overlapping transcripts.
- Use Salmon’s selective alignment mode for improved isoform resolution, especially for lncRNAs and circRNAs.
- Normalize for transcript length and sequencing depth (e.g., TPM, not just raw counts).
Note: GENCODE annotations are vital for distinguishing lncRNAs, snoRNAs, pseudogenes, and antisense RNAs.
4. Differential Expression Analysis
Differential expression (DE) analysis in total RNA-seq data includes both mRNAs and non-coding RNAs. Specialized handling is essential.
Tools & Methods:
- DESeq2: Default for most DE studies; handles raw counts
- edgeR: Empirical Bayes modeling; ideal for small sample sizes
- limma-voom: For large datasets with good quality control
Normalization methods:
- Use library-size correction (e.g., median-of-ratios in DESeq2)
- Avoid TPMs or FPKMs for DE analysis; stick to raw counts
- Include ERCC spike-ins if used, especially in clinical RNA-seq
Statistical considerations:
- Design matrices must account for batch effects, paired samples, and interaction terms
- Apply false discovery rate (FDR) correction for multiple testing
5. Alternative Splicing & Isoform Analysis
Capturing exon-level dynamics is a key advantage of total RNA-seq, especially with paired-end reads.
Splicing tools:
- rMATS: Detects exon skipping, intron retention, and complex splicing events
- MAJIQ: Models local splicing variations using probabilistic inference
- SUPPA2: Lightweight; suitable for isoform switching studies
Biological implications:
- Splice variants often differ between healthy and diseased tissues
- Intron retention is linked to stress responses, cancer, and neural differentiation
Context matters: Use known transcript annotations and validate novel splicing with IGV or long-read support if available.
6. Fusion Gene & Transcript Discovery
Total RNA-seq retains non-canonical transcripts, making it suitable for detecting fusions and chimeras common in cancer and rare disorders.
Detection tools:
- STAR-Fusion: Uses STAR’s chimeric output to identify fusion breakpoints
- Arriba: Highly sensitive, integrates with tumor RNA-seq pipelines
Use cases:
- Oncogenic fusion detection (e.g., BCR-ABL1, TMPRSS2-ERG)
- Novel transcript discovery in undiagnosed genetic conditions
Validation strategies:
- Visualize with IGV
- Confirm with RT-PCR or long-read sequencing (e.g., PacBio)
7. Functional Annotation & Pathway Enrichment
Interpreting total RNA-seq results requires mapping expression changes to known functions and pathways.
Functional tools:
- clusterProfiler (R): For GO, KEGG, Reactome enrichment
- g:Profiler, DAVID: Alternative web-based tools
- lncRNA2Function: For annotating long non-coding RNAs
For non-coding RNAs:
- Predict lncRNA targets using co-expression or tools like LncTar, LncADeep
- Integrate with mRNA/miRNA expression to build regulatory networks
Network tools like Cytoscape or WGCNA can reveal higher-order interactions and ncRNA modules.
For researchers lacking in-house infrastructure or bioinformatics expertise, Biostate AI offers AI-driven analysis pipelines, differential expression reporting, and full support for coding and non-coding transcriptomes, ready to scale from clinical RNA-seq to large cohort studies.
While a reliable analysis pipeline is essential, total RNA-seq presents unique technical hurdles. Understanding these challenges is key to ensuring data accuracy and meaningful biological insights.
Technical Challenges in Total RNA-Seq

Total RNA-seq offers broad transcriptome coverage but introduces several analytical and experimental challenges. These must be carefully managed to ensure data quality, reproducibility, and biological relevance throughout the total RNA sequencing data analysis process.
1. Residual rRNA Reads
Even after rRNA depletion, a significant proportion of reads can originate from ribosomal RNA. This is especially common in low-quality or FFPE-derived samples, where fragmented rRNA escapes depletion probes. These reads reduce usable coverage and waste sequencing capacity.
2. Low-Quality or Limited RNA Input
Samples from clinical settings often contain degraded RNA or low input amounts, leading to poor library complexity and uneven coverage. RNA fragmentation interferes with accurate isoform quantification and introduces bias toward shorter transcripts.
3. Annotation Incompleteness for Non-Coding RNAs
While annotations for protein-coding genes are mature, many long non-coding RNAs, snoRNAs, and antisense transcripts are still uncharacterized or misannotated. This limits accurate quantification, isoform detection, and downstream functional interpretation.
4. Multimapping and Repetitive Regions
Many non-coding RNAs and pseudogenes are derived from repetitive or homologous sequences. Reads from these regions frequently map to multiple genomic loci, complicating alignment and increasing ambiguity in expression estimates.
5. Biases in Library Preparation
Library construction steps such as fragmentation, adapter ligation, and reverse transcription can introduce bias. These biases affect transcript representation, strand specificity, and size distribution especially problematic for structured or GC-rich RNAs.
6. Batch Variability
Total RNA-seq is sensitive to subtle variations in protocol execution. Differences across extraction kits, library prep batches, or sequencing runs can result in unwanted technical variation. If not accounted for in the experimental design or analysis, these effects can obscure biological signals.
Addressing the technical complexities of total RNA-seq can be time-consuming and resource-intensive. Here’s how Biostate AI makes the process simple and efficient.
How Biostate AI Simplifies Total RNA-Seq

Biostate AI removes the technical complexity from total RNA sequencing by offering a streamlined, end-to-end solution tailored to modern research needs. From sample processing to advanced insights, every stage is optimized for accuracy, speed, and accessibility.
- Affordable End-to-End Workflow: Biostate AI handles everything from sample extraction, rRNA depletion, library prep, sequencing, to data analysis under one platform. This ensures consistency across the entire pipeline and eliminates the need for coordinating across multiple vendors or tools.
- Low-Quality and Low-Input Compatibility: The platform accepts challenging samples, including those with RNA Integrity Numbers (RIN) as low as 2. Whether working with FFPE tissues, blood-derived RNA, or small-volume liquid biopsies, researchers can still generate meaningful transcriptomic data.
- Full Transcriptome Coverage: Biostate AI supports total RNA-seq workflows that retain both polyadenylated and non-polyadenylated transcripts. This allows robust profiling of mRNAs, lncRNAs, circRNAs, pre-mRNAs, and other non-coding RNAs often missed in conventional poly(A)-selected protocols.
- AI-Driven Data Interpretation: With the OmicsWeb platform, researchers can move from raw reads to biological insights rapidly. The AI engine automates QC, differential expression, splicing analysis, and pathway enrichment reducing reliance on in-house bioinformatics teams.
- Clear Pricing and Fast Turnaround: With transparent pricing starting at $80 per sample and turnaround times as short as 1–3 weeks, Biostate AI removes traditional barriers of cost and delay making high-quality total RNA-seq feasible at scale.
Biostate AI transforms total RNA-seq into a practical, scalable, and insight-driven process especially suited for complex research questions involving coding and non-coding transcript dynamics.
Final Words
Total RNA sequencing has emerged as a cornerstone of transcriptomics, offering researchers a complete view of the transcriptome from protein-coding genes to regulatory non-coding RNAs. Yet, it brings significant analytical and technical complexities, including rRNA depletion precision, non-coding RNA quantification, and isoform-level interpretation. Success in total RNA-seq depends on the right blend of sample handling, sequencing strategy, and bioinformatics rigor.
Biostate AI delivers that complete solution. With optimized protocols, FFPE compatibility, and AI-powered analysis through OmicsWeb, we simplify every stage of total RNA-seq. Transparent pricing and fast turnaround ensure your research stays on track without compromising depth or accuracy.
Get in touch with us today to launch your total RNA-seq project with confidence.
FAQs
1. How many reads for total RNA-seq?
Total RNA-seq typically requires 20–50 million paired-end reads per sample to capture a broad range of RNA species, including coding mRNAs and non-coding RNAs, ensuring sufficient depth for reliable quantification and detection of low-abundance transcripts.
2. What is total RNA sequencing data analysis?
It is the comprehensive process of transforming raw sequencing reads into biologically meaningful results by performing quality control, trimming, aligning reads to a reference genome, quantifying gene and transcript abundance, identifying splice variants, and analyzing expression patterns.
3. How to analyze RNA-seq data step by step?
Key steps include: 1) Perform quality control on raw reads (FastQC), 2) Trim adapters and low-quality bases (Trimmomatic), 3) Align reads to a reference genome (STAR/Hisat2), 4) Quantify expression at gene or transcript level (featureCounts/Salmon), 5) Normalize data (DESeq2/edgeR), 6) Conduct differential expression analysis, 7) Perform functional enrichment and pathway analysis.
4. What is the workflow of RNA-seq data analysis?
The typical workflow involves sample extraction and library prep, sequencing, raw data QC, adapter trimming, alignment to a reference genome, expression quantification, normalization, differential expression testing, and downstream biological interpretation such as gene ontology and pathway enrichment.
