dsRNA Downloads

Human dsRNAome v1 — tables, GFF3, BP arcs

Predictions of double-stranded RNA structures across the human genome (GRCh38 / hg38). 5,134,754 dsRNAs from dsRNAscan + RNAduplex, annotated with conservation, gene context, editing sites, and three ML model scores.

v1 · Generated 2026-05-23 · Pipeline 2025-06-19 · All files served from https://dsrna.chpc.utah.edu/tools/downloads/

Browser tracks & pipeline files

GFF3 for IGV / UCSC, BEDPE for bedtools pairtobed, BP arcs, high-confidence subsets, per-model variants, sequences companion.

See full file list ↓

Filter dsRNAs interactively

Search by gene, locus, or model score and inspect dsRNAs without downloading anything. Best for picking out specific predictions.

Open the Shiny table User guide ›

Download the main dataset

One Parquet file with all 5,134,754 dsRNAs and every analytical column (except RNA sequence and structure). Reads in seconds with pandas, polars, or DuckDB.

Download dsRNA_human_v1.parquet (325 MB) Column dictionary (TSV) ›
Filter by format:
File Format Size # dsRNAs Description
dsRNA_human_v1.parquet Parquet (zstd) 325 MB 5,134,754 Main dataset for analysis. Every column except sequence and predicted_structure. Recommended starting point for most users — filter, group, plot with pandas / polars / DuckDB.
dsRNA_human_v1_extended.parquet Parquet (zstd) 522 MB 5,134,754 Sequences and structures. Companion to the main parquet, keyed by dsRNA_id. Join when you need the RNA sequence or the dot-bracket structure.
data_dictionary.tsv TSV 6.6 KB Column reference. Machine-readable per-column descriptions, types, units, and source-column traces.
README.md Markdown Markdown copy of this page — included with the dataset for offline reading.
dsRNA_all.gff3.gz bgzipped GFF3 1.5 GB 5,134,754 All 5.1M dsRNAs for genome browsers. mRNA / exon features (one mRNA + two exon rows per dsRNA, see encoding section below). Loads directly in IGV, UCSC, JBrowse.
dsRNA_high_confidence.gff3.gz bgzipped GFF3 955 MB 2,458,476 High-confidence subset for genome browsers. dsRNAs called high-confidence by at least one of the three ML models.
intermolecular_dsRNA.gff3 GFF3 small 769 Intermolecular dsRNAs. Sense–antisense gene pairs forming dsRNAs between separate transcripts (distinct from the intramolecular hairpins in the main dataset).
dsRNA_all_slim.bedpe.gz bgzipped BEDPE 240 MB 5,134,754 All 5.1M dsRNAs for bedtools pipelines. One row per dsRNA, two intervals (i-arm + j-arm). Built for bedtools pairtobed. See the BEDPE warning below.
dsRNA_high_conf_any_slim.bedpe.gz bgzipped BEDPE 123 MB 2,458,476 High-confidence by any of 3 models. Same schema as the all-dsRNAs BEDPE; subset to dsRNAs called high-confidence by at least one of GTEx / Stability / Probing.
dsRNA_high_conf_all3_slim.bedpe.gz bgzipped BEDPE 77 MB 1,509,138 High-confidence by all 3 models. Strictest tier — called high-confidence by GTEx, Stability, AND Probing.
dsRNA_gtex_high_conf_slim.bedpe.gz bgzipped BEDPE 88 MB 1,713,035 High-confidence by GTEx model only. gtex_model_score ≥ 0.2513.
dsRNA_stability_high_conf_slim.bedpe.gz bgzipped BEDPE 107 MB 2,125,678 High-confidence by Stability model only. stability_model_score ≥ 0.2471.
dsRNA_probing_high_conf_slim.bedpe.gz bgzipped BEDPE 111 MB 2,226,288 High-confidence by Probing model only. structure_probing_score ≥ 0.0315 (raw scale).

Each .bedpe.gz and .gff3.gz has a sibling tabix index at the same URL with .tbi appended (e.g. dsRNA_all_slim.bedpe.gz.tbi). Download the index separately if you want random-access region queries.

Working with the files

Worked examples for loading, filtering, and overlap analysis live on the recipes page:

High-confidence definition

Three independent ML models score each dsRNA. A dsRNA is considered high-confidence by a given model when its raw score exceeds the documented threshold:

Model Column Threshold High-conf count
GTEx editing gtex_model_score >= 0.2513 1,713,035
Stability (no-GTEx) stability_model_score >= 0.2471 2,125,678
Structure-probing 3'UTR structure_probing_score >= 0.0315 2,226,288

The n_models_high_conf column (integer 0-3) is the sum of the three boolean *_high_conf columns. Filter on it for tiered confidence levels.

Note: gtex_confidence_label and stability_confidence_label are pre-computed string labels from the upstream pipeline and almost always agree with the booleans. Use the booleans for filtering and the labels for cross-checks.

Interactive browser

For exploratory filtering (gene name, coordinates, length, pairing), use the dsRNAscan Shiny browser: https://dsrna.chpc.utah.edu/shiny/dsrna/

User guide: https://dsrna.chpc.utah.edu/tools/user_guide.html

Column dictionary

Full per-column descriptions, types, units, and source-column traces are in data_dictionary.tsv. The recipes page also shows how to print the schema directly from any Parquet file using pandas / polars / DuckDB.

Version & changelog