Human dsRNAome Resource

About

dsRNAscan is a computational pipeline that scans any given RNA/DNA sequence for regions capable of forming double-stranded RNA structures. By default, it uses a scanning window approach, analyzing overlapping 10 kb windows to report inverted repeats with at least 25 base pairs and at least 70% pairing. Applied to GRCh38, dsRNAscan produced 5,134,754 predictions, of which approximately 2.4 million are scored as high-confidence by at least one of three machine-learning models (GTEx, Stability, and Probing).

Paper: Andrews RJ and Bass BL. dsRNAscan maps human dsRNAome revealing conservation, intermolecular dsRNA, and determinants of ADAR dependency.
Code: github.com/Bass-Lab/dsRNAscan

Using the Shiny table

dsrna.chpc.utah.edu/shiny/dsrna/

For searching and downloading subsets of the dsRNAome.

Search (top of sidebar). Enter gene symbols (MYC, ADAR), Ensembl IDs (ENSG00000136997), or both in the same box, comma or newline separated. The Locus box accepts chr4, chr4:1,000,000, or chr4:1,000,000-2,000,000. Click Apply Filters to run.

Confidence sliders. Three sliders for GTEx, Stability, and Probing model scores. Each has a High Confidence toggle that snaps the slider to that model's high-confidence cutoff (0.25, 0.25, 0.46 respectively). All active filters must pass.

Other filters. Percent paired (70-100%), length range (25-5000 bp; at least one arm must fall in the range), folding energy (0 = no filter; more negative is stricter), "Conserved only" (phastCons > 0.5 on both arms), and "Has RNA editing sites".

Summary tab. Shows the total match count, the filters that were applied, and 2x2 breakdowns by editing and repetitive status.

Table views. The table opens in Essentials view (~15 commonly-used columns). Click All columns to reveal everything (per-arm coordinates, raw ML scores, conservation scores, sequences, etc.).

Downloads. Click Download... to open a dialog with two choices: Format (CSV, compressed CSV.gz, or Parquet) and Columns (a checkbox per database column, pre-selected to the Essentials set). Quick links above the checkbox list switch between Essentials / All / None. Downloads always include the full match set, regardless of the in-table cap. A live size estimate updates as you toggle. Default size is also shown under the top button.

A note on result limits. The in-table cap (default 1,000, max 50,000) only controls how many rows are rendered for browsing - increase it to scroll more, decrease it for speed. The table is the first N rows by SQLite scan order (not a random sample). Plots use a random sample of up to 20,000 rows from the full match set, so distributions are representative.

Using the IGV browser

dsrna.chpc.utah.edu/igv.html

The IGV browser places predicted dsRNAs in genomic context alongside standard tracks (gene models, A-to-I editing sites, RBP binding).

Each prediction appears as a coloured arc spanning its two arms; darker shades indicate higher confidence. Use the sidebar filter panel to control which arcs are displayed - same scores and thresholds as the Shiny app. Click any arc for the underlying ML scores, energy, and a link to the structure visualization on FORNA.

Jump to a region by typing a gene symbol or coordinate in the IGV search box (e.g. MYC or chr8:128,748,000-128,755,000).

Using the structure browser

dsrna.chpc.utah.edu/api/dsrna-browser/

The structure browser shows one dsRNA at a time with its predicted secondary structure, RNA structure probing reactivity, ML scores, conservation, and editing sites.

Search for a specific dsRNA by ID or gene name. The structure diagram is drawn with FORNA and coloured by reactivity where probing data is available. Per-dsRNA downloads are available in FASTA, CT, and DBN formats for downstream modelling (RNAfold, inforna, RoseTTAFoldNA, etc.).

Loading the Parquet dataset

dsrna.chpc.utah.edu/tools/bedtools_recipes.html#parquet

For analysis in Python / R / SQL, the primary file is dsRNA_human_v1.parquet (~325 MB, zstd-compressed) - all 5.1M dsRNAs and every analytical column except RNA sequence and predicted structure (those are in a separate _extended companion, joined on dsRNA_id). Parquet is columnar, so column-projected queries and filter pushdown make it fast even at full size.

The recipes page has worked snippets for the three common loaders - pandas (familiar, eager), polars (lazy scans, fastest cold reads of the full file), and DuckDB (SQL directly on the parquet, including joins with the extended companion). Each example uses real columns - n_models_high_conf, i_phast100 / j_phast100 for conservation, stranded_editing_* for editing, gtex_model_score for ML confidence.

For full column descriptions, see the data_dictionary.tsv or use the loader's introspection (pl.scan_parquet(...).collect_schema() / DESCRIBE SELECT * FROM 'file.parquet').

Bulk analysis with BEDPE & bedtools

dsrna.chpc.utah.edu/tools/bedtools_recipes.html

For command-line / pipeline use, the dsRNAome is also published as BEDPE: one row per dsRNA with two intervals (i-arm + j-arm) and 28 extra columns carrying lengths, energy, conservation, ML scores, editing counts, and repeat annotations. Six subsets are available - all dsRNAs, per-model high-confidence, and any-of-3 / all-3 high-confidence.

The recipes page collects worked examples for the common questions: filtering by ML confidence with awk, overlap with m6A sites, with GENCODE exons (both arms in same transcript, asymmetric exon-intron, loop spans an exon), with ENCODE eCLIP peaks (ADAR1 / Staufen co-binding), and converting BEDPE to BAM for IGV via bedpetobam.

One critical gotcha: most bedtools subcommands (intersect, sort, merge, coverage) read columns 1-3 as the interval and silently ignore the j-arm in columns 4-6. Use bedtools pairtobed for overlap questions, or split into per-arm BEDs first. The recipes page has the full warning and split-then-rejoin example.

Key terms

i-arm / j-arm - the two paired arms of the inverted repeat that forms the dsRNA.
Length (i / j) - nucleotides on each arm, separately.
Percent paired - fraction of nucleotides in the structure that are base-paired.
Longest helix - the longest contiguous run of base pairs without bulge or mismatch.
Energy (kcal/mol) - thermodynamic stability from RNAduplex. More negative is more stable.
Edited - the dsRNA overlaps A-to-I editing sites from REDIportal v3 on both arms.
Repetitive / non-repetitive - whether an arm intersects an Alu, LINE, SINE, LTR, or simple repeat. Over 90% of predicted dsRNAs are repetitive.
GTEx Model score - probability the dsRNA matches edited-dsRNA features, including GTEx tissue expression. High-confidence cutoff: 0.25 (AUC 0.961).
Stability Model score - same idea using structural features only (no expression). Cutoff: 0.25 (AUC 0.914).
Probing Model score - probability the dsRNA agrees with RNA structure probing data. Min-max normalised to 0-1 in this database. Cutoff: 0.46 (AUC 0.847).
phastCons 100 / 17 - nucleotide conservation scores across 100 vertebrates and 17 primates, 0-1.

Citing

If you use the dsRNAome in published work, please cite:

Andrews RJ and Bass BL. dsRNAscan maps human dsRNAome revealing conservation, intermolecular dsRNA, and determinants of ADAR dependency.

Questions or feedback: ryan.andrews@biochem.utah.edu, bbass@biochem.utah.edu.

dsRNA Browser User Guide