Predictions of double-stranded RNA structures across the human genome (GRCh38 / hg38). 5,134,754 dsRNAs from dsRNAscan + RNAduplex, annotated with conservation, gene context, editing sites, and three ML model scores.
Browser tracks & pipeline files
GFF3 for IGV / UCSC, BEDPE for bedtools pairtobed, BP arcs, high-confidence subsets, per-model variants, sequences companion.
Filter dsRNAs interactively
Search by gene, locus, or model score and inspect dsRNAs without downloading anything. Best for picking out specific predictions.
Open the Shiny table User guide ›Download the main dataset
One Parquet file with all 5,134,754 dsRNAs and every analytical column (except RNA sequence and structure). Reads in seconds with pandas, polars, or DuckDB.
Download dsRNA_human_v1.parquet (325 MB) Column dictionary (TSV) ›| File | Format | Size | # dsRNAs | Description |
|---|---|---|---|---|
| dsRNA_human_v1.parquet | Parquet (zstd) | 325 MB | 5,134,754 | Main dataset for analysis. Every column except sequence and predicted_structure. Recommended starting point for most users — filter, group, plot with pandas / polars / DuckDB. |
| dsRNA_human_v1_extended.parquet | Parquet (zstd) | 522 MB | 5,134,754 | Sequences and structures. Companion to the main parquet, keyed by dsRNA_id. Join when you need the RNA sequence or the dot-bracket structure. |
| data_dictionary.tsv | TSV | 6.6 KB | — | Column reference. Machine-readable per-column descriptions, types, units, and source-column traces. |
| README.md | Markdown | — | — | Markdown copy of this page — included with the dataset for offline reading. |
| dsRNA_all.gff3.gz | bgzipped GFF3 | 1.5 GB | 5,134,754 | All 5.1M dsRNAs for genome browsers. mRNA / exon features (one mRNA + two exon rows per dsRNA, see encoding section below). Loads directly in IGV, UCSC, JBrowse. |
| dsRNA_high_confidence.gff3.gz | bgzipped GFF3 | 955 MB | 2,458,476 | High-confidence subset for genome browsers. dsRNAs called high-confidence by at least one of the three ML models. |
| intermolecular_dsRNA.gff3 | GFF3 | small | 769 | Intermolecular dsRNAs. Sense–antisense gene pairs forming dsRNAs between separate transcripts (distinct from the intramolecular hairpins in the main dataset). |
| dsRNA_all_slim.bedpe.gz | bgzipped BEDPE | 240 MB | 5,134,754 | All 5.1M dsRNAs for bedtools pipelines. One row per dsRNA, two intervals (i-arm + j-arm). Built for bedtools pairtobed. See the BEDPE warning below. |
| dsRNA_high_conf_any_slim.bedpe.gz | bgzipped BEDPE | 123 MB | 2,458,476 | High-confidence by any of 3 models. Same schema as the all-dsRNAs BEDPE; subset to dsRNAs called high-confidence by at least one of GTEx / Stability / Probing. |
| dsRNA_high_conf_all3_slim.bedpe.gz | bgzipped BEDPE | 77 MB | 1,509,138 | High-confidence by all 3 models. Strictest tier — called high-confidence by GTEx, Stability, AND Probing. |
| dsRNA_gtex_high_conf_slim.bedpe.gz | bgzipped BEDPE | 88 MB | 1,713,035 | High-confidence by GTEx model only. gtex_model_score ≥ 0.2513. |
| dsRNA_stability_high_conf_slim.bedpe.gz | bgzipped BEDPE | 107 MB | 2,125,678 | High-confidence by Stability model only. stability_model_score ≥ 0.2471. |
| dsRNA_probing_high_conf_slim.bedpe.gz | bgzipped BEDPE | 111 MB | 2,226,288 | High-confidence by Probing model only. structure_probing_score ≥ 0.0315 (raw scale). |
Each .bedpe.gz and .gff3.gz has a sibling tabix index at the same URL with .tbi appended (e.g. dsRNA_all_slim.bedpe.gz.tbi). Download the index separately if you want random-access region queries.
Working with the files
Worked examples for loading, filtering, and overlap analysis live on the recipes page:
- Loading the Parquet — pandas, polars, DuckDB; joining the sequences companion.
- GFF3 encoding & IGV recipe — mRNA + exon parent-child layout, one-click load.
- BEDPE & bedtools recipes — awk filtering by confidence column,
pairtobedfor m6A / exon / eCLIP overlap,bedpetobam→ IGV, and the critical "intersect silently breaks BEDPE" warning.
High-confidence definition
Three independent ML models score each dsRNA. A dsRNA is considered high-confidence by a given model when its raw score exceeds the documented threshold:
| Model | Column | Threshold | High-conf count |
|---|---|---|---|
| GTEx editing | gtex_model_score |
>= 0.2513 | 1,713,035 |
| Stability (no-GTEx) | stability_model_score |
>= 0.2471 | 2,125,678 |
| Structure-probing 3'UTR | structure_probing_score |
>= 0.0315 | 2,226,288 |
- Any of 3: 2,458,476 dsRNAs
- All 3: 1,509,138 dsRNAs
The n_models_high_conf column (integer 0-3) is the sum of the three boolean *_high_conf columns. Filter on it for tiered confidence levels.
Note: gtex_confidence_label and stability_confidence_label are pre-computed string labels from the upstream pipeline and almost always agree with the booleans. Use the booleans for filtering and the labels for cross-checks.
Interactive browser
For exploratory filtering (gene name, coordinates, length, pairing), use the dsRNAscan Shiny browser: https://dsrna.chpc.utah.edu/shiny/dsrna/
User guide: https://dsrna.chpc.utah.edu/tools/user_guide.html
Column dictionary
Full per-column descriptions, types, units, and source-column traces are in data_dictionary.tsv. The recipes page also shows how to print the schema directly from any Parquet file using pandas / polars / DuckDB.
Version & changelog
- v1 (2026): initial public release. Replaces the old 1.5 GB GFF3-only release and the old 11 GB Shiny SQLite with friendly column names and a split structures companion. Adds GFF3 + BP browser tracks.