Center for Listening Research Dispatches · Long-form
CLR-001 · v.1
Filed 28 · APR · 2026
Dispatch № 001 · Improvisation Group Field-recording deep dive

Three signals on, all at once.

I am building a small set of detectors that try to flag when a jam-band performance has left the song behind. Before the detectors are useful for finding anything in the corpus, I need to know they behave the way a careful listener would on cases the community already agrees on. UIC '98 Bathtub Gin is one of those cases. Twenty-four minutes, jamcharts-tagged, decades-deep consensus. This dispatch walks through what each of the three detectors does on that recording, minute by minute.

FILE COPY Do · Not · Remove
Performance
1998 · 11 · 09
Venue
UIC Pavilion
Run time
24:06
Keys (start / middle / end)
G maj / C maj / C maj
Subtypes detected
A · B · C (full set)
Classification
Canonical (consensus pick)

I needed a reference Type II performance to anchor the detectors against, something the community had already settled on. ("Type II" is Phish-fan vernacular for an improvised jam that abandons the composition and turns into the band listening to each other.) The jamcharts (a community-curated list at phish.net of jams the fanbase has flagged as worth hearing) list a handful of Bathtub Gins as Type II. UIC '98 is the canonical pick of those. So I pulled it, ran the three subtype detectors I have so far, and watched each one light up where I expected it to.

The protocol is plain. Pick the performance the community has long agreed is a textbook Type II. Run all three detectors in sequence over the audio. Show that each subtype lights up where a careful listener would say it should: home key abandoned (Subtype A), chord palette the head never visits (Subtype B), individual chords held for thirty-plus seconds in a sustained drone (Subtype C). Then cross-check on the other Type-II-tagged Gins in the small reference set. Virginia Beach 1998-08-09 and Bethel 2024-08-11 fire all three with different shapes. The unflagged versions don't. Worcester '95 has the drone but never leaves home; the 2024-04-19 and 2024-07-28 versions barely move at all. The point is not a leaderboard. The point is: the detectors agree with the consensus on the cases the consensus is loud about.

Vocabulary, since I lean on it. The head is the composed first ninety seconds (the song as written). The jam is the improvised middle (whatever the band decides to do that night). The tail is the last ninety seconds (the band working back to the head). For UIC '98 that is twenty-four minutes total: about a minute and a half of head, twenty-one minutes of jam, then a return. Once a detector behaves sensibly against canonical references, the natural next step is to point it at unseen tape, recordings the community has not flagged, ideally from a different band, and ask whether it surfaces analogs that hold up under listening review. That is the work that follows. This dispatch is the calibration.

A short glossary, if you want it

Type II. Phish-fan word for a jam that abandons the song and turns into the band listening to each other.

Detector. One specific test I run on a recording. There are three so far, named A, B, C.

Cosine distance. A 0-to-1 number for how different two things are. 0 means identical, 1 means nothing in common.

Chord-finder. The bit of the pipeline that names the dominant chord in each thirty-second window of audio.

Drone ratio. How many times longer the band sat on each chord in the jam compared to the head. UIC '98 sits at 3.98, almost four times longer.

Hit play
0:00 / --:--
Figure 01 · The chord ribbon

Where the band sits, over twenty-four minutes.

Each cell is the dominant chord during one thirty-second window. Color is chord identity. The thin white ticks underneath are minute marks. Read it left-to-right with the audio. The first ninety seconds are the composed head: many chords. The long swath of one color in the middle is the band sitting on a single chord. Click anywhere to seek the audio there.

FIG-01 · Dominant chord per 30-second window
Figure 02 · The energy contour

Two summits, about eight minutes apart.

Loudness over time. The two peaks at 11:30 and 19:30 are the dynamic climaxes, moments where the band locks in and the room comes up with them. The central drone passage sits between them, holding the energy steady before each release.

FIG-02 · Loudness over time
Reference · The three subtypes

What the algorithm is looking for.

A reference card before the detector-signal plot below. Each subtype is a different way a band can leave a song. A performance can fire any subset. UIC '98 fires the full set, which is why I am using it as the calibration anchor.

A
Key departure

Leaving the home key.

The jam ventures to a non-home key. Detected by checking, window by window, how far the tonal center has drifted from the head's key.

Threshold 0.10  ·  cosine distance (a 0-to-1 number where 0 means identical key and 1 means unrelated key)
UIC '98: 0.34 (peak window 0.66, about as far from G major as the metric goes in this space)
B
Vocabulary expansion

New chords, same key.

The jam stays near the home key but uses a chord palette the head never visits. I count up how many seconds the band spent on each chord during the head, then do the same for the jam, and measure how different the two counts are.

Threshold 0.18  ·  Jensen-Shannon divergence (a way of comparing two distributions; runs 0 to 0.69, where 0 means same chord palette and 0.69 means no chords in common)
UIC '98: 0.21 · highest in corpus
C
Sustained drone

Sitting on one chord.

The band holds individual chords for ten to thirty-plus seconds. Measured by the ratio of average chord-segment length in the jam compared to the head.

Threshold 1.5×  ·  how many times longer the band sat on each chord in the jam compared to the head; under 1.0 means the jam was busier than the head
UIC '98: 3.98× (the band held chords almost four times longer in the jam than in the head)
Figure 03 · The subtype signals

A and B fire together at 1:30.

This is the moment that makes UIC '98 a triple-fire performance. The pink line is Subtype A (key-departure). The blue line is Subtype B (chord-vocabulary divergence). Each has a dashed threshold. The bottom strip shades pink/blue/orange when each subtype is firing, A on top, B middle, C (drone) at the bottom.

FIG-03 · Subtype detector signals + firing strip
Both signals cross threshold around 1:30. The launch is noisy. A drops below threshold for individual windows at roughly 3:00 and 4:00, and B is intermittent through the same period, before both lock in continuously from 5:30 onward. That stuttery 1:30 to 5:30 zone is the band committing in starts and stops. The post-5:30 stretch is the sustained Type II. The drone strip (orange) fades in by 5:30 and dominates the central section. All three signals release in the final minute as the band works back home.
03 · Minute by minute

The score, marked up.

A typographic transcription of the time-series above. Clock at left. Subtype state in the middle. Field note at right. The two intensity peaks fall at roughly 11:30 and 19:30, a textbook two-summit dynamic arc.

Hit play, then read along. Each row highlights as the audio passes through it. Click any timestamp at left to jump there.
0:00 to 1:30
composed head
Composed head in G major. Five to nine distinct chords per thirty-second window, typical Bathtub Gin movement through verse and chorus. Top chords: Cmaj · Gmaj · Emin.
1:30
A + B fire (noisy)
Subtypes A and B fire simultaneously. Band moves from G major into C major. This is when the performance starts to commit, but the launch is not clean: A drops back below threshold for one window at roughly 3:00 and again at 4:00, and B is intermittent through the same span. Per-window dropouts in the launch period are typical. The chord-finder (the bit of the pipeline that names the dominant chord in each thirty-second window) briefly grabs home-key content during transitional moments. By 5:30 both signals lock in for good.
5:30 to 15:00
full A + B + C
The central drone passage. For most thirty-second windows, the chord-classifier finds only one chord, the band sits on a single Cmaj for thirty-plus seconds at a stretch. The drone ratio (jam-segment length over head-segment length) reaches 3.98, second only to Worcester '95 in the entire corpus.
11:30
peak I
First intensity climax. Loudest moment so far. The drone resolves upward into the room.
16:00 to 16:30
A peaks
Peak key-distance from the head. The Subtype A signal reaches 0.664, the most harmonically distant moment of the twenty-four minutes. This is the band as far from G major as it ever gets.
19:30
peak II
Second intensity climax. Same loudness as 11:30. Two distinct peaks of equal height, the second one slightly farther from home key than the first.
21:00 →
return
Band starts working back toward home. Subtype signals stay on through 23:00, then begin to subside in the order they came on, C first, then B, then A as the head returns.
04 · UIC '98 in context

The reference set, all twelve.

The small reference set I hand-picked for calibration: six community-tagged Type IIs spanning three decades, two unremarkable 2010s versions, three 2024 baselines, and one monster from Phish's 2003 return tour after the 2000-02 hiatus. Twelve performances is not a survey of Bathtub Gin. There are hundreds. It is a working sample chosen to span the kinds of behavior I want the detectors to discriminate. What I use it for: confirm that the three jamcharts-tagged Type IIs (UIC '98, Virginia Beach '98, Bethel '24) all fire A + B + C, and confirm that the unremarkable versions and baselines do not. Both checks pass. Worth flagging that no single subtype is what makes UIC '98 the calibration anchor: Worcester '95 actually has a slightly higher drone ratio (4.05 vs UIC's 3.98), and Bethel '24 edges UIC on key departure (0.43 vs 0.34). UIC is canonical because it fires the strongest conjunction of all three together, and the conjunction is the definition of textbook Type II by the consensus. The detectors agree on UIC because the thresholds (0.10 for A, 0.18 for B, 1.5× for C) were calibrated against eight canonical Phish Type II performances I had already accepted as the reference. That is construct-validity, not a discovery. The same thresholds also don't fire on the unremarkable and baseline performances I tested, which is a sanity check that calibration didn't over-fit catastrophically, but it is not predictive validity. One more disclosure: the twelve performances span 1995 to 2024 across audience tapes, soundboards, and the 2024 Sphere production, all unified under the relisten provider but otherwise heterogeneous in recording style. The chroma-based metrics (A and B) are mostly robust to that, since the chromagram is a relative pitch-class profile. The drone ratio (C) is more exposed. The interesting work happens later, when I point the same chain at tape nobody has tagged.

Performance Length Head key Jam key Key dep. Chord div. Drone ratio Subtypes firing
1995 · 12 · 29 WorcesterCCT, MA 11:06 C maj C maj 0.00 0.05 4.05 C
1997 · 08 · 17 LoringLimestone, ME · “Loaded Gin” 15:21 C maj C maj 0.00 0.17 1.98 B + C
1998 · 07 · 29 RiverportMaryland Heights, MO 24:04 G min G min 0.00 0.11 2.04 B + C
1998 · 08 · 09 Virginia BeachGTE Amphitheater · jamcharts 15:02 G min C maj 0.24 0.12 2.72 A + B + C
1999 · 12 · 31 Big CypressSeminole Reservation · NYE 16:23 G min C maj 0.18 0.06 1.80 A + C
2003 · 02 · 22 CincinnatiU.S. Bank Arena · first run after the 2000-02 hiatus 26:43 C maj C maj 0.00 0.07 3.90 C
2010 · 08 · 06 BerkeleyGreek Theatre, CA 11:07 C min C maj 0.10 0.11 1.20 A + B
2014 · 07 · 15 CMACCanandaigua, NY 11:25 G maj C maj 0.10 0.13 1.42 A + B
2024 · 04 · 19 Spherebaseline 14:18 C maj C maj 0.00 0.06 1.09 None
2024 · 07 · 28 Alpine Valleybaseline 12:03 C maj C maj 0.00 0.05 0.82 None
2024 · 08 · 11 BethelBethel Woods · jamcharts (recent) 18:49 C maj F maj 0.43 0.11 1.57 A + B + C

A note on the Bethel call. Bethel '24's whole-jam chord-vocabulary divergence averages to 0.11, below the 0.18 threshold the table is checking. I score it as Subtype B anyway because the score for individual thirty-second windows crosses 0.18 during the F-major passage in the back half. The whole-jam average smears that excursion out; the window-by-window view catches it. That is a measurement-method choice, not a free pass. It is the same choice that promoted Bethel from “A + C partial” on an earlier multi-metric scorecard to “A + B + C” here. A different reasonable threshold would leave it as A + C. I flag it because that kind of borderline call is exactly what the dispatch should surface, not bury.

05 · The validation step

Take the calibrated detector, aim it at unseen tape.

A reference example is only useful if the detector tuned against it generalises. So once I had anchored A + B + C against UIC '98 (and cross-checked against Virginia Beach '98 and Bethel '24), I ran the same chain over recordings the community has not flagged. Two early candidates came back as triple-fires from outside the Bathtub Gin set. These are working candidates, not pronouncements. Both still need close listening review before I would call either a confirmed Type II. The reason to show them here is methodological: this is what the validation step looks like, with all the rough edges visible. Denominators matter, so: the Great Woods '24 Tweezer surfaced from a small set of seven baseline Tweezers where three of seven triple-fired, which is high enough that it could be the chain miscalibrating against 2024-era recordings rather than three real discoveries. The Goose IWD4U surfaced from a forty-eight-performance Goose 2025-2026 application where one performance fired all three, which is rare enough to be its own signal. Different evidence weight on the two candidates.

Match · 01 · Phish

Great Woods Tweezer

2024 · 07 · 21 to 21:48 minutes

Key dep.
0.513
Chord div.
0.161
Drone ratio
1.94

Plus a tail-outlier score of 0.177, where most Tweezer tails in the reference set come in under 0.10. The score measures how unusual the last ninety seconds sound versus other Tweezer endings; higher means more unusual. That is the highest of the Tweezers I have looked at, including the 1995 Memphis fifty-minute Tweezer at 0.087. This is also the most chord-divergent Tweezer in the reference set, almost two standard deviations above the mean for the song. A candidate worth a closer listen, not a verdict.

Match · 02 · Goose

I Would Die 4 U

2026 · 04 · 24 to 18:00 minutes · Prince cover

Key dep.
0.630
Chord div.
0.295
Drone ratio
1.61

A four-minute Prince cover taken into eighteen minutes of A + B + C territory by a different band entirely. The case is on the bench for a future dispatch, once listener calibration is in hand.

06 · What follows

This is the first dispatch.

A calibration note, not a textbook. Each subtype has its own anatomy. Each candidate match has its own minute-by-minute story. The reason to write in public, slowly, against a small bench of recordings, is to keep the reasoning visible while it is still being figured out. A short list of what is next on the bench:

I post when something is worth filing, not on a schedule. If you want to hear about new dispatches, send a note: zabriskieapp@gmail.com.

← Back to the Center