Human dsRNAome Resource

Predictions of double-stranded RNA structures across the human genome (GRCh38 / hg38). 5,134,754 dsRNAs from dsRNAscan + RNAduplex, annotated with conservation, gene context, editing sites, and three ML model scores.

v1 · Generated 2026-05-23 · Pipeline 2025-06-19 · All files served from https://dsrna.chpc.utah.edu/tools/downloads/

Browser tracks & pipeline files

GFF3 for IGV / UCSC, BEDPE for bedtools pairtobed, BP arcs, high-confidence subsets, per-model variants, sequences companion.

See full file list ↓

Filter dsRNAs interactively

Search by gene, locus, or model score and inspect dsRNAs without downloading anything. Best for picking out specific predictions.

Open the Shiny table User guide ›

Download the main dataset

One Parquet file with all 5,134,754 dsRNAs and every analytical column (except RNA sequence and structure). Reads in seconds with pandas, polars, or DuckDB.

Download dsRNA_human_v1.parquet (325 MB) Column dictionary (TSV) ›

Parquet BEDPE GFF3 Reference (TSV / README)

File	Format	Size	# dsRNAs	Description
dsRNA_human_v1.parquet	Parquet (zstd)	325 MB	5,134,754	Main dataset for analysis. Every column except sequence and predicted_structure. Recommended starting point for most users — filter, group, plot with pandas / polars / DuckDB.
dsRNA_human_v1_extended.parquet	Parquet (zstd)	522 MB	5,134,754	Sequences and structures. Companion to the main parquet, keyed by `dsRNA_id`. Join when you need the RNA sequence or the dot-bracket structure.
data_dictionary.tsv	TSV	6.6 KB	—	Column reference. Machine-readable per-column descriptions, types, units, and source-column traces.
README.md	Markdown	—	—	Markdown copy of this page — included with the dataset for offline reading.
dsRNA_all.gff3.gz	bgzipped GFF3	1.5 GB	5,134,754	All 5.1M dsRNAs for genome browsers. mRNA / exon features (one mRNA + two exon rows per dsRNA, see encoding section below). Loads directly in IGV, UCSC, JBrowse.
dsRNA_high_confidence.gff3.gz	bgzipped GFF3	955 MB	2,458,476	High-confidence subset for genome browsers. dsRNAs called high-confidence by at least one of the three ML models.
intermolecular_dsRNA.gff3	GFF3	small	769	Intermolecular dsRNAs. Sense–antisense gene pairs forming dsRNAs between separate transcripts (distinct from the intramolecular hairpins in the main dataset).
dsRNA_all_slim.bedpe.gz	bgzipped BEDPE	240 MB	5,134,754	All 5.1M dsRNAs for bedtools pipelines. One row per dsRNA, two intervals (i-arm + j-arm). Built for `bedtools pairtobed`. See the BEDPE warning below.
dsRNA_high_conf_any_slim.bedpe.gz	bgzipped BEDPE	123 MB	2,458,476	High-confidence by any of 3 models. Same schema as the all-dsRNAs BEDPE; subset to dsRNAs called high-confidence by at least one of GTEx / Stability / Probing.
dsRNA_high_conf_all3_slim.bedpe.gz	bgzipped BEDPE	77 MB	1,509,138	High-confidence by all 3 models. Strictest tier — called high-confidence by GTEx, Stability, AND Probing.
dsRNA_gtex_high_conf_slim.bedpe.gz	bgzipped BEDPE	88 MB	1,713,035	High-confidence by GTEx model only. `gtex_model_score ≥ 0.2513`.
dsRNA_stability_high_conf_slim.bedpe.gz	bgzipped BEDPE	107 MB	2,125,678	High-confidence by Stability model only. `stability_model_score ≥ 0.2471`.
dsRNA_probing_high_conf_slim.bedpe.gz	bgzipped BEDPE	111 MB	2,226,288	High-confidence by Probing model only. `structure_probing_score ≥ 0.0315` (raw scale).

Each .bedpe.gz and .gff3.gz has a sibling tabix index at the same URL with .tbi appended (e.g. dsRNA_all_slim.bedpe.gz.tbi). Download the index separately if you want random-access region queries.

Working with the files

Worked examples for loading, filtering, and overlap analysis live on the recipes page:

Loading the Parquet — pandas, polars, DuckDB; joining the sequences companion.
GFF3 encoding & IGV recipe — mRNA + exon parent-child layout, one-click load.
BEDPE & bedtools recipes — awk filtering by confidence column, pairtobed for m6A / exon / eCLIP overlap, bedpetobam → IGV, and the critical "intersect silently breaks BEDPE" warning.

High-confidence definition

Three independent ML models score each dsRNA. A dsRNA is considered high-confidence by a given model when its raw score exceeds the documented threshold:

Model	Column	Threshold	High-conf count
GTEx editing	`gtex_model_score`	>= 0.2513	1,713,035
Stability (no-GTEx)	`stability_model_score`	>= 0.2471	2,125,678
Structure-probing 3'UTR	`structure_probing_score`	>= 0.0315	2,226,288

Any of 3: 2,458,476 dsRNAs
All 3: 1,509,138 dsRNAs

The n_models_high_conf column (integer 0-3) is the sum of the three boolean *_high_conf columns. Filter on it for tiered confidence levels.

Note: gtex_confidence_label and stability_confidence_label are pre-computed string labels from the upstream pipeline and almost always agree with the booleans. Use the booleans for filtering and the labels for cross-checks.

Interactive browser

For exploratory filtering (gene name, coordinates, length, pairing), use the dsRNAscan Shiny browser: https://dsrna.chpc.utah.edu/shiny/dsrna/

User guide: https://dsrna.chpc.utah.edu/tools/user_guide.html

Column dictionary

Full per-column descriptions, types, units, and source-column traces are in data_dictionary.tsv. The recipes page also shows how to print the schema directly from any Parquet file using pandas / polars / DuckDB.

Version & changelog

v1 (2026): initial public release. Replaces the old 1.5 GB GFF3-only release and the old 11 GB Shiny SQLite with friendly column names and a split structures companion. Adds GFF3 + BP browser tracks.

dsRNA Downloads