Human dsRNAome Resource

Loading the Parquet file

The main dataset (dsRNA_human_v1.parquet, ~325 MB, zstd-compressed) carries all 5,134,754 dsRNAs and every analytical column except the RNA sequence and predicted structure (those live in the extended companion, joined on dsRNA_id). Parquet is columnar and zstd-compressed, so column-projected queries and filters run in seconds even at full size.

pandas

import pandas as pd

df = pd.read_parquet('dsRNA_human_v1.parquet')

# High-confidence by all 3 models
hc = df[df['n_models_high_conf'] == 3]

# Conserved across vertebrates (both arms phastCons100 > 0.5)
conserved = hc[(hc['i_phast100'] > 0.5) & (hc['j_phast100'] > 0.5)]

# Stranded editing on both arms
edited_both = hc[(hc['stranded_editing_i_sites'] > 0) &
                 (hc['stranded_editing_j_sites'] > 0)]

print(f"{len(hc):,} HC, {len(conserved):,} conserved, {len(edited_both):,} edited on both arms")

polars (lazy — fastest for big slices)

Lazy scans push filters and column selection down to the Parquet reader, so they only touch the columns and row groups they need. Significantly faster than pandas on cold reads of the full file.

import polars as pl

# Lazy scan: nothing is loaded until .collect()
hits = (pl.scan_parquet('dsRNA_human_v1.parquet')
          .filter(pl.col('n_models_high_conf') == 3)
          .filter((pl.col('i_gene_name') == 'MYC') |
                  (pl.col('j_gene_name') == 'MYC'))
          .select(['dsRNA_id', 'chr', 'i_start', 'j_end',
                   'energy_kcal_mol', 'gtex_model_score'])
          .collect())
print(hits)

DuckDB (SQL directly on the file)

DuckDB lets you query the parquet without converting to a database, and you can join two parquets in a single SQL statement — handy for the main + extended pair.

import duckdb

con = duckdb.connect()

# Top 20 most stable HC-by-all-3 dsRNAs in a region
hits = con.execute("""
  SELECT dsRNA_id, chr, i_start, j_end,
         energy_kcal_mol, longest_helix,
         gtex_model_score, stability_model_score, structure_probing_score
  FROM 'dsRNA_human_v1.parquet'
  WHERE n_models_high_conf = 3
    AND chr = 'chr8'
    AND i_start BETWEEN 128000000 AND 129000000
  ORDER BY energy_kcal_mol ASC
  LIMIT 20
""").df()

Joining the sequence / structure companion

The extended file (dsRNA_human_v1_extended.parquet, ~522 MB) is keyed by dsRNA_id and carries the RNA sequence and dot-bracket predicted structure. Pull it in only when you need it — keeping the join lazy preserves Parquet's filter-pushdown advantages.

# pandas
main = pd.read_parquet('dsRNA_human_v1.parquet')
ext  = pd.read_parquet('dsRNA_human_v1_extended.parquet')
with_seq = (main[main['n_models_high_conf'] == 3]
              .merge(ext, on='dsRNA_id', how='left'))

# DuckDB (SQL join in one query)
df = con.execute("""
  SELECT m.dsRNA_id, m.chr, m.energy_kcal_mol, m.gtex_model_score,
         e.sequence, e.predicted_structure
  FROM 'dsRNA_human_v1.parquet' m
  JOIN 'dsRNA_human_v1_extended.parquet' e USING (dsRNA_id)
  WHERE m.n_models_high_conf = 3
    AND m.i_phast100 > 0.5 AND m.j_phast100 > 0.5
""").df()

Column reference

Full descriptions, types, units, and source-column traces are in the data_dictionary.tsv. You can also introspect from any of the loaders above:

# polars
pl.scan_parquet('dsRNA_human_v1.parquet').collect_schema()

# DuckDB
con.execute("DESCRIBE SELECT * FROM 'dsRNA_human_v1.parquet'").df()

# pandas
df.dtypes

GFF3 for genome browsers

Two bgzipped + tabix-indexed GFF3 files load directly into IGV, UCSC, JBrowse, or any GFF3-aware viewer: dsRNA_all.gff3.gz (all 5.1M) and dsRNA_high_confidence.gff3.gz (the any-of-3 subset, ~2.5M).

Each dsRNA is encoded as one mRNA feature with two exon children — one per arm. The loop region between the arms is implicit (it's the gap between the two exons, effectively an intron). This is fully GFF3-compliant and uses parent-child relationships that every browser already knows how to render.

chr1  RNAduplex  mRNA  10195  19583  .  -  .  ID=dsRNA_chr1_1;energy=-270.1;match_perc=71.03;...
chr1  dsRNAscan  exon  10195  10512  .  -  .  ID=exon-chr1_1-1;Parent=dsRNA_chr1_1
chr1  dsRNAscan  exon  19260  19583  .  -  .  ID=exon-chr1_1-2;Parent=dsRNA_chr1_1

The mRNA line carries the analytical attributes: energy, match_perc, i_phast100, GTEx, Stability, Probing, sequence, predicted_structure, forna_link, etc.
The two exon lines carry only ID and Parent.
The mRNA start / end cover the full dsRNA span; the two exons cover the i-arm and j-arm coordinates separately.
All coordinates are 1-based, inclusive (GFF3 convention).

Quick IGV recipe

File -> Load from URL...
URL: https://dsrna.chpc.utah.edu/tools/downloads/dsRNA_high_confidence.gff3.gz
Index URL: (leave blank; the .tbi auto-resolves)

FORNA URL reconstruction

The GFF3 carries a forna_link attribute on each mRNA line for one-click structure viewing. The Parquet/SQLite files omit it because it's deterministic from sequence and predicted_structure:

import urllib.parse
def forna_url(seq: str, struct: str) -> str:
    base = 'http://rna.tbi.univie.ac.at/forna/forna.html?id=url/name'
    return f'{base}&sequence={urllib.parse.quote(seq)}&structure={urllib.parse.quote(struct)}'

About the BEDPE files

The slim BEDPE files at downloads encode each dsRNA as one row with two genomic intervals — columns 1-3 are the i-arm, columns 4-6 are the j-arm. This shape is the native fit for two-armed biological objects and pairs with bedtools pairtobed for overlap questions ("either arm overlaps a gene", "both arms in the same gene", "one arm exonic, other intronic"). Columns 11-38 carry every analytical attribute (arm lengths, helix length, energy, percent paired, gene names, repeats, ML scores, editing counts).

Six files are provided so you can start with the slice you actually want:

File	Subset	n dsRNAs
`dsRNA_all_slim.bedpe.gz`	everything	5,134,754
`dsRNA_high_conf_any_slim.bedpe.gz`	high-confidence by ≥ 1 model	2,458,476
`dsRNA_high_conf_all3_slim.bedpe.gz`	high-confidence by all 3 models	1,509,138
`dsRNA_gtex_high_conf_slim.bedpe.gz`	GTEx model high-conf	1,713,035
`dsRNA_stability_high_conf_slim.bedpe.gz`	Stability model high-conf	2,125,678
`dsRNA_probing_high_conf_slim.bedpe.gz`	Probing model high-conf	2,226,288

Each .bedpe.gz has a sibling .tbi tabix index at the same URL with .tbi appended — download it separately if you want random-access region queries.

Warning: most bedtools subcommands silently ignore the second arm.

bedtools intersect, sort, merge, coverage, closest, etc. all read cols 1-3 as the interval and treat cols 4-6 as opaque extra columns. The output looks plausible but only reflects the i-arm. The BEDPE-aware subcommands are pairtobed (BEDPE vs BED/GFF/VCF), pairtopair (BEDPE vs BEDPE), bedpetobam, and bamtobed -bedpe. For anything else, split the file into per-arm BEDs first — see the last section on this page.

Column layout

Headerless TSV, 38 columns. The first 10 are the standard BEDPE spec; the rest are dsRNA-specific extras you'll use for filtering with awk.

#	Column	Notes
1-6	chr1 start1 end1 chr2 start2 end2	i-arm + j-arm coords, 0-based half-open
7	name	dsRNA_id (e.g. `dsRNA_chr1_42`)
8	score	round(avg(i_length, j_length)) — drives arc thickness in IGV
9-10	strand1 strand2	both strands (the dsRNA's strand)
11-13	i_length j_length loop_length	nt
14	energy_kcal_mol	RNAduplex free energy, 2dp
15	percent_paired	integer %
16	longest_helix	bp
17-18	i_gene_name j_gene_name	stale — will be redone vs GENCODE v49 in v2
19	repetitive	`Repetitive` / `Non-Repetitive`
20-23	i_phast100 j_phast100 i_phast17 j_phast17	conservation, 2dp
24	gtex_model_score	0-1, 3dp. HC ≥ 0.2513
25	stability_model_score	0-1, 3dp. HC ≥ 0.2471
26	structure_probing_score	0-1 (min-max normalized), 3dp. HC ≥ 0.46
27-29	gtex_high_conf stability_high_conf structure_probing_high_conf	0/1 booleans
30	n_models_high_conf	0-3 (sum of the three booleans)
31-32	stranded_editing_i_sites stranded_editing_j_sites	REDIportal, strand-aware
33-34	unstranded_editing_i_sites unstranded_editing_j_sites	strand-agnostic
35-36	stranded_editing_sites unstranded_editing_sites	i + j totals
37-38	i_repetitive_element j_repetitive_element	RepeatMasker family/name(s)

Quick filtering with awk

The simplest pipeline pattern is zcat | awk | bgzip. Decompress on the fly, filter on any column, recompress (and re-tabix if you want random access).

All 3 models call it high-confidence?

zcat dsRNA_all_slim.bedpe.gz \
  | awk -F'\t' '$30 == 3' \
  | bgzip > dsRNA_unanimous.bedpe.gz
tabix -p bed dsRNA_unanimous.bedpe.gz   # ok despite warning -- pairtobed will still work

Same logic as dsRNA_high_conf_all3_slim.bedpe.gz — useful as a template for tighter cutoffs.

GTEx high-conf AND tightly paired AND strong helix?

zcat dsRNA_all_slim.bedpe.gz \
  | awk -F'\t' '$27 == 1 && $15 >= 85 && $16 >= 40' \
  | bgzip > dsRNA_gtex_tight.bedpe.gz

Col 27 = gtex_high_conf, 15 = percent_paired, 16 = longest_helix.

High GTEx score AND has stranded editing on both arms?

zcat dsRNA_high_conf_any_slim.bedpe.gz \
  | awk -F'\t' '$24 >= 0.5 && $31 > 0 && $32 > 0' \
  | bgzip > dsRNA_gtex_edited.bedpe.gz
zcat dsRNA_gtex_edited.bedpe.gz | wc -l

Conservation across vertebrates (both arms phastCons100 > 0.5)?

zcat dsRNA_all_slim.bedpe.gz \
  | awk -F'\t' '$20 > 0.5 && $21 > 0.5' \
  | bgzip > dsRNA_conserved.bedpe.gz

Non-repetitive only (drop everything sitting in an Alu/LINE/SINE)?

zcat dsRNA_high_conf_any_slim.bedpe.gz \
  | awk -F'\t' '$19 == "Non-Repetitive"' \
  | bgzip > dsRNA_nonrep.bedpe.gz

Only ~6% of high-conf dsRNAs are non-repetitive; this is a useful pre-filter for cross-organism comparisons.

Energetically stable AND long (> 100 bp longest helix)?

zcat dsRNA_all_slim.bedpe.gz \
  | awk -F'\t' '$14 <= -100 && $16 > 100' \
  | bgzip > dsRNA_strong.bedpe.gz

pairtobed: overlap recipes

bedtools pairtobed understands the two-arm relationship natively. Its -type flag controls how the two arms must intersect feature B:

-type	Returns dsRNAs where...
`either`	at least one arm overlaps any feature in B (default)
`both`	both arms overlap the same feature in B
`xor`	exactly one arm overlaps B (one in, one out)
`neither`	neither arm overlaps B
`ispan`	the loop region (between the arms) overlaps B
`notispan`	the loop region does not overlap B
`ospan`	any part of the i-arm-start to j-arm-end span overlaps B

Other useful flags: -s (same strand only), -S (opposite strand only), -f 0.5 (require at least 50% of A's interval to overlap B).

Worked example 1: dsRNAs near m6A sites

You need a hg38 BED of m6A site coordinates. Two well-curated databases publish them:

m6A-Atlas 2.0 — integrated high-confidence m6A sites across tissues. Download from the Download tab; select GRCh38 / BED format.
REPIC — RNA Epitranscriptome Collection. Has per-study downloads + a merged catalog. GRCh38 BEDs are under Download → Browse.

The recipes below assume the file is named m6A_sites.bed (any BED4+ format works — only cols 1-3 are used for overlap; the name column lets you trace hits back if you want).

Which high-conf dsRNAs have an m6A site on either arm?

bedtools pairtobed \
  -a dsRNA_high_conf_any_slim.bedpe.gz \
  -b m6A_sites.bed \
  -type either \
  | cut -f1-38 \
  | awk -F'\t' '!seen[$7]++' \
  | bgzip > dsRNAs_with_m6A.bedpe.gz
zcat dsRNAs_with_m6A.bedpe.gz | wc -l

Output is a valid BEDPE with all 38 analytical columns intact — load it in IGV, hand it to another pairtobed call, or open in pandas. cut -f1-38 drops the appended m6A feature columns; awk '!seen[$7]++' dedupes by dsRNA_id (col 7) so dsRNAs that overlap multiple m6A sites only appear once.

m6A on i-arm AND m6A on j-arm of the same dsRNA (paired methylation)?

bedtools pairtobed -a dsRNA_high_conf_any_slim.bedpe.gz \
  -b m6A_sites.bed -type both \
  | cut -f1-38 \
  | awk -F'\t' '!seen[$7]++' \
  | bgzip > dsRNAs_m6A_paired.bedpe.gz

-type both requires the SAME m6A site to be present in both arms — rare. If you want "any m6A on each arm", use two intersects (one per arm BED) and intersect the resulting dsRNA-id sets (see the per-arm-split section).

m6A density in the loop region (back-splice / circRNA candidates)?

bedtools pairtobed -a dsRNA_high_conf_any_slim.bedpe.gz \
  -b m6A_sites.bed -type ispan \
  | cut -f1-38 \
  | awk -F'\t' '!seen[$7]++' \
  | bgzip > dsRNAs_m6A_loop.bedpe.gz
zcat dsRNAs_m6A_loop.bedpe.gz | wc -l

-type ispan tests overlap with the implicit "loop" between the two arms — useful for catching dsRNAs that flank a methylation cluster.

Worked example 2: splice sites & exons

Use GENCODE v49 (or your favourite annotation). Extract exons into a BED with transcript_id in the name column — this lets you ask "same transcript" questions later.

# One-time prep: download GENCODE v49 + build an exons BED carrying transcript_id.
# (On macOS use `gzcat` in place of `zcat`.)
curl -sLO https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_49/gencode.v49.annotation.gff3.gz

zcat gencode.v49.annotation.gff3.gz \
  | awk -F'\t' '$3=="exon"' \
  | awk -F'\t' '{ split($9, a, ";");
                  for (i in a) if (a[i] ~ /^transcript_id=/) tid=substr(a[i], 15);
                  printf "%s\t%d\t%d\t%s\n", $1, $4-1, $5, tid; }' \
  | sort -k1,1 -k2,2n > gencode_v49_exons.bed

Both arms overlap an exon (same transcript, "fully exonic")?

bedtools pairtobed -a dsRNA_high_conf_any_slim.bedpe.gz \
  -b gencode_v49_exons.bed -type both \
  | cut -f1-38 \
  | awk -F'\t' '!seen[$7]++' \
  | bgzip > dsRNAs_exonic_pairs.bedpe.gz

For the strict "BOTH arms 100% inside same transcript" definition you need a per-arm intersect pipeline (pairtobed -type both only checks at-least-one-bp overlap). See the per-arm-split section below.

One arm exonic, the other intronic (asymmetric, edits one isoform)?

bedtools pairtobed -a dsRNA_high_conf_any_slim.bedpe.gz \
  -b gencode_v49_exons.bed -type xor \
  | cut -f1-38 \
  | awk -F'\t' '!seen[$7]++' \
  | bgzip > dsRNAs_exon_intron.bedpe.gz

Arms flank an exon (loop = the exon, classic back-splice geometry)?

bedtools pairtobed -a dsRNA_high_conf_any_slim.bedpe.gz \
  -b gencode_v49_exons.bed -type ispan \
  | cut -f1-38 \
  | awk -F'\t' '!seen[$7]++' \
  | bgzip > dsRNAs_flanking_exon.bedpe.gz

Worked example 3: ENCODE eCLIP (ILF3)

ENCODE eCLIP peaks for any RBP are downloadable as BED narrowPeak files from encodeproject.org/eclip/. ADAR / ADARB1 are not in the ENCODE eCLIP catalog, but several well-characterised dsRNA binders are — ILF3 (NF90/NF110), DHX9, STAU2, IGF2BP1/2/3. ILF3 makes a good starting point because it has the most direct dsRNA-binding biology of the set.

Two ILF3 eCLIP experiments are released on ENCODE (rep1+rep2 IDR-merged narrowPeak files, GRCh38, blacklist-removed):

# ILF3 in K562 (~3K peaks)
curl -sLO https://www.encodeproject.org/files/ENCFF385MJU/@@download/ENCFF385MJU.bed.gz
gunzip -c ENCFF385MJU.bed.gz > ILF3_K562_eCLIP.bed

# ILF3 in HepG2 (different binding landscape, ~10K peaks)
curl -sLO https://www.encodeproject.org/files/ENCFF071QDP/@@download/ENCFF071QDP.bed.gz
gunzip -c ENCFF071QDP.bed.gz > ILF3_HepG2_eCLIP.bed

Experiment pages for reference: ENCSR438KWZ (K562), ENCSR786TSC (HepG2). The files are 10-column BED narrowPeak format; pairtobed reads cols 1-3 so the extra columns are harmless.

dsRNAs with an ILF3 eCLIP peak on either arm?

bedtools pairtobed -a dsRNA_high_conf_any_slim.bedpe.gz \
  -b ILF3_K562_eCLIP.bed -type either \
  | cut -f1-38 \
  | awk -F'\t' '!seen[$7]++' \
  | bgzip > ILF3_bound_dsRNAs.bedpe.gz

Both arms of the dsRNA covered by ILF3 peaks (strong binding)?

bedtools pairtobed -a dsRNA_high_conf_any_slim.bedpe.gz \
  -b ILF3_K562_eCLIP.bed -type both \
  | cut -f1-38 \
  | awk -F'\t' '!seen[$7]++' \
  | bgzip > ILF3_double_bound.bedpe.gz

-type both requires the SAME peak to cover both arms — very strict. For "any peak on each arm" use two intersects (one per arm BED) and intersect the resulting dsRNA-id sets.

High-conf dsRNAs NOT bound by ILF3 (interesting outliers)?

bedtools pairtobed -a dsRNA_high_conf_all3_slim.bedpe.gz \
  -b ILF3_K562_eCLIP.bed -type neither \
  | cut -f1-38 \
  | awk -F'\t' '!seen[$7]++' \
  | bgzip > high_conf_no_ILF3.bedpe.gz

-type neither reports each dsRNA at most once, so the dedup is belt-and-suspenders.

Worked example 4: intermolecular dsRNAs (sense-antisense pairs)

The main BEDPE files cover intramolecular hairpin dsRNAs. A separate, smaller file covers intermolecular dsRNAs — sense / antisense gene pairs whose transcripts overlap and can pair in trans:

curl -sLO https://dsrna.chpc.utah.edu/intermolecular_dsRNA.gff3

769 records, GFF3 format, single feature type antisense_dsRNA. Key attributes: gene_plus, gene_minus, biotype_plus, biotype_minus, total_overlap_bp, total_editing_sites, n_eers (editing-enriched regions), dsrna_tx_plus / dsrna_tx_minus (Ensembl transcript IDs per strand).

Load in IGV with one click?

File -> Load from URL...
URL: https://dsrna.chpc.utah.edu/intermolecular_dsRNA.gff3
Index URL: (leave blank)

Pairs involving a specific gene (e.g. TP53)?

awk -F'\t' '$3=="antisense_dsRNA"' intermolecular_dsRNA.gff3 \
  | grep -E 'gene_plus=TP53(;|$)|gene_minus=TP53(;|$)'
# Returns the TP53 / WRAP53 antisense pair on chr17.

The (;|$) anchor avoids false hits like TP53AIP1 when you really wanted TP53. ADAR / ADARB1 don't have antisense partners in this dataset.

Heavily-edited intermolecular dsRNAs (>= 20 editing sites)?

awk -F'\t' '$3=="antisense_dsRNA" {
              match($9, /total_editing_sites=([0-9]+)/, m);
              if (m[1]+0 >= 20) print
            }' intermolecular_dsRNA.gff3 \
  | wc -l

Requires gawk (for the match() third-arg array). On macOS install via brew install gawk and call as gawk.

List all gene pairs as a 2-column TSV (for downstream joins)?

awk -F'\t' '$3=="antisense_dsRNA" {
              split($9, a, ";");
              gp=""; gm="";
              for (i in a) {
                if (a[i] ~ /^gene_plus=/)  gp = substr(a[i], 11);
                if (a[i] ~ /^gene_minus=/) gm = substr(a[i], 12);
              }
              print gp"\t"gm
            }' intermolecular_dsRNA.gff3 > intermolecular_pairs.tsv
wc -l intermolecular_pairs.tsv

Both partners protein-coding (drop lncRNA / pseudogene pairs)?

awk -F'\t' '$3=="antisense_dsRNA" \
            && $9 ~ /biotype_plus=protein_coding/ \
            && $9 ~ /biotype_minus=protein_coding/' intermolecular_dsRNA.gff3 \
  > intermolecular_pc_only.gff3
wc -l intermolecular_pc_only.gff3

bedpetobam: load BEDPE in samtools / IGV.org workflows

bedtools bedpetobam converts BEDPE into a paired-end-style BAM so you can sort, index, and load alongside read alignments in IGV, samtools, or any BAM-aware tool.

# Get a chromosome-sizes file (hg38). Pick whichever you have available:
#   (a) UCSC tools:   fetchChromSizes hg38 > hg38.chromsizes
#   (b) Indexed FASTA: cut -f1,2 hg38.fa.fai > hg38.chromsizes
#   (c) Direct download from UCSC:
curl -sL https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes \
     -o hg38.chromsizes

# Convert + sort + index.
bedtools bedpetobam -i dsRNA_high_conf_all3_slim.bedpe.gz \
                    -g hg38.chromsizes \
  | samtools sort -o dsRNA_high_conf_all3.bam
samtools index dsRNA_high_conf_all3.bam

Now dsRNA_high_conf_all3.bam + .bai loads in IGV web/desktop as a normal BAM track, with each dsRNA appearing as a "read pair" connecting its two arms. You can also pipe it through samtools for region-based queries:

samtools view dsRNA_high_conf_all3.bam chr8:128,748,000-128,755,000

Caveat: bedpetobam in bedtools < 2.30 truncates output. Upgrade to 2.30+ if you hit that.

Per-arm splitting (when you need plain bedtools intersect)

If your downstream tool needs single-interval BED (e.g. bedtools coverage, genomecov, custom awk pipelines that expect cols 1-3), split the BEDPE into two BEDs — one per arm — before doing anything else:

zcat dsRNA_high_conf_any_slim.bedpe.gz \
  | awk -F'\t' 'BEGIN{OFS="\t"} {print $1, $2, $3, $7, $8, $9}' \
  | sort -k1,1 -k2,2n > i_arms.bed

zcat dsRNA_high_conf_any_slim.bedpe.gz \
  | awk -F'\t' 'BEGIN{OFS="\t"} {print $4, $5, $6, $7, $8, $10}' \
  | sort -k1,1 -k2,2n > j_arms.bed

Both files use the dsRNA_id as the BED name field (col 4), so you can join the two arms back together by ID after intersecting. Example — per-arm bp overlap with 3'UTRs, then rejoin:

# One-time prep: extract 3'UTRs from GENCODE v49.
zcat gencode.v49.annotation.gff3.gz \
  | awk -F'\t' '$3=="three_prime_UTR" {printf "%s\t%d\t%d\t.\n", $1, $4-1, $5}' \
  | sort -k1,1 -k2,2n > gencode_v49_3UTR.bed

bedtools intersect -a i_arms.bed -b gencode_v49_3UTR.bed -wo -sorted \
  | awk -F'\t' '{sum[$4] += $NF} END {for (id in sum) print id"\t"sum[id]}' \
  > i_3UTR_bp.tsv
bedtools intersect -a j_arms.bed -b gencode_v49_3UTR.bed -wo -sorted \
  | awk -F'\t' '{sum[$4] += $NF} END {for (id in sum) print id"\t"sum[id]}' \
  > j_3UTR_bp.tsv

# Join: dsRNAs with 3'UTR overlap on BOTH arms (id, bp_i, bp_j)
join -t$'\t' -j1 \
  <(sort i_3UTR_bp.tsv) \
  <(sort j_3UTR_bp.tsv) > both_arms_3UTR.tsv

# Subset the source BEDPE back down to just these dsRNAs - now you have a
# valid BEDPE for IGV / pairtobed / pandas, not just an ID stats table.
zcat dsRNA_high_conf_any_slim.bedpe.gz \
  | awk -F'\t' 'NR==FNR {ids[$1]=1; next} ids[$7]' both_arms_3UTR.tsv - \
  | bgzip > both_arms_3UTR.bedpe.gz

bedtools reference: pairtobed, bedpetobam.
Want to run the same questions interactively (without bedtools)? The Filter dsRNAs Shiny app exposes the same confidence columns through sliders and exports filtered subsets directly.
Bigger picture: the User Guide explains how the ML scores were trained and what the cutoffs mean biologically.

Recipes for dsRNA data analysis