I am building a small set of detectors that try to flag when a jam-band performance has left the song behind. Before the detectors are useful for finding anything in the corpus, I need to know they behave the way a careful listener would on cases the community already agrees on. UIC '98 Bathtub Gin is one of those cases. Twenty-four minutes, jamcharts-tagged, decades-deep consensus. This dispatch walks through what each of the three detectors does on that recording, minute by minute.
I needed a reference Type II performance to anchor the detectors against, something the community had already settled on. ("Type II" is Phish-fan vernacular for an improvised jam that abandons the composition and turns into the band listening to each other.) The jamcharts (a community-curated list at phish.net of jams the fanbase has flagged as worth hearing) list a handful of Bathtub Gins as Type II. UIC '98 is the canonical pick of those. So I pulled it, ran the three subtype detectors I have so far, and watched each one light up where I expected it to.
The protocol is plain. Pick the performance the community has long agreed is a textbook Type II. Run all three detectors in sequence over the audio. Show that each subtype lights up where a careful listener would say it should: home key abandoned (Subtype A), chord palette the head never visits (Subtype B), individual chords held for thirty-plus seconds in a sustained drone (Subtype C). Then cross-check on the other Type-II-tagged Gins in the small reference set. Virginia Beach 1998-08-09 and Bethel 2024-08-11 fire all three with different shapes. The unflagged versions don't. Worcester '95 has the drone but never leaves home; the 2024-04-19 and 2024-07-28 versions barely move at all. The point is not a leaderboard. The point is: the detectors agree with the consensus on the cases the consensus is loud about.
Vocabulary, since I lean on it. The head is the composed first ninety seconds (the song as written). The jam is the improvised middle (whatever the band decides to do that night). The tail is the last ninety seconds (the band working back to the head). For UIC '98 that is twenty-four minutes total: about a minute and a half of head, twenty-one minutes of jam, then a return. Once a detector behaves sensibly against canonical references, the natural next step is to point it at unseen tape, recordings the community has not flagged, ideally from a different band, and ask whether it surfaces analogs that hold up under listening review. That is the work that follows. This dispatch is the calibration.
Type II. Phish-fan word for a jam that abandons the song and turns into the band listening to each other.
Detector. One specific test I run on a recording. There are three so far, named A, B, C.
Cosine distance. A 0-to-1 number for how different two things are. 0 means identical, 1 means nothing in common.
Chord-finder. The bit of the pipeline that names the dominant chord in each thirty-second window of audio.
Drone ratio. How many times longer the band sat on each chord in the jam compared to the head. UIC '98 sits at 3.98, almost four times longer.
Each cell is the dominant chord during one thirty-second window. Color is chord identity. The thin white ticks underneath are minute marks. Read it left-to-right with the audio. The first ninety seconds are the composed head: many chords. The long swath of one color in the middle is the band sitting on a single chord. Click anywhere to seek the audio there.
Loudness over time. The two peaks at 11:30 and 19:30 are the dynamic climaxes, moments where the band locks in and the room comes up with them. The central drone passage sits between them, holding the energy steady before each release.
A reference card before the detector-signal plot below. Each subtype is a different way a band can leave a song. A performance can fire any subset. UIC '98 fires the full set, which is why I am using it as the calibration anchor.
The jam ventures to a non-home key. Detected by checking, window by window, how far the tonal center has drifted from the head's key.
The jam stays near the home key but uses a chord palette the head never visits. I count up how many seconds the band spent on each chord during the head, then do the same for the jam, and measure how different the two counts are.
The band holds individual chords for ten to thirty-plus seconds. Measured by the ratio of average chord-segment length in the jam compared to the head.
This is the moment that makes UIC '98 a triple-fire performance. The pink line is Subtype A (key-departure). The blue line is Subtype B (chord-vocabulary divergence). Each has a dashed threshold. The bottom strip shades pink/blue/orange when each subtype is firing, A on top, B middle, C (drone) at the bottom.
A typographic transcription of the time-series above. Clock at left. Subtype state in the middle. Field note at right. The two intensity peaks fall at roughly 11:30 and 19:30, a textbook two-summit dynamic arc.
The small reference set I hand-picked for calibration: six community-tagged Type IIs spanning three decades, two unremarkable 2010s versions, three 2024 baselines, and one monster from Phish's 2003 return tour after the 2000-02 hiatus. Twelve performances is not a survey of Bathtub Gin. There are hundreds. It is a working sample chosen to span the kinds of behavior I want the detectors to discriminate. What I use it for: confirm that the three jamcharts-tagged Type IIs (UIC '98, Virginia Beach '98, Bethel '24) all fire A + B + C, and confirm that the unremarkable versions and baselines do not. Both checks pass. Worth flagging that no single subtype is what makes UIC '98 the calibration anchor: Worcester '95 actually has a slightly higher drone ratio (4.05 vs UIC's 3.98), and Bethel '24 edges UIC on key departure (0.43 vs 0.34). UIC is canonical because it fires the strongest conjunction of all three together, and the conjunction is the definition of textbook Type II by the consensus. The detectors agree on UIC because the thresholds (0.10 for A, 0.18 for B, 1.5× for C) were calibrated against eight canonical Phish Type II performances I had already accepted as the reference. That is construct-validity, not a discovery. The same thresholds also don't fire on the unremarkable and baseline performances I tested, which is a sanity check that calibration didn't over-fit catastrophically, but it is not predictive validity. One more disclosure: the twelve performances span 1995 to 2024 across audience tapes, soundboards, and the 2024 Sphere production, all unified under the relisten provider but otherwise heterogeneous in recording style. The chroma-based metrics (A and B) are mostly robust to that, since the chromagram is a relative pitch-class profile. The drone ratio (C) is more exposed. The interesting work happens later, when I point the same chain at tape nobody has tagged.
| Performance | Length | Head key | Jam key | Key dep. | Chord div. | Drone ratio | Subtypes firing |
|---|---|---|---|---|---|---|---|
| 1995 · 12 · 29 WorcesterCCT, MA | 11:06 | C maj | C maj | 0.00 | 0.05 | 4.05 | C |
| 1997 · 08 · 17 LoringLimestone, ME · “Loaded Gin” | 15:21 | C maj | C maj | 0.00 | 0.17 | 1.98 | B + C |
| 1998 · 07 · 29 RiverportMaryland Heights, MO | 24:04 | G min | G min | 0.00 | 0.11 | 2.04 | B + C |
| 1998 · 08 · 09 Virginia BeachGTE Amphitheater · jamcharts | 15:02 | G min | C maj | 0.24 | 0.12 | 2.72 | A + B + C |
| ★1998 · 11 · 09 UICUIC Pavilion · Chicago, IL | 24:06 | G maj | C maj | 0.34 | 0.21 | 3.98 | A + B + C |
| 1999 · 12 · 31 Big CypressSeminole Reservation · NYE | 16:23 | G min | C maj | 0.18 | 0.06 | 1.80 | A + C |
| 2003 · 02 · 22 CincinnatiU.S. Bank Arena · first run after the 2000-02 hiatus | 26:43 | C maj | C maj | 0.00 | 0.07 | 3.90 | C |
| 2010 · 08 · 06 BerkeleyGreek Theatre, CA | 11:07 | C min | C maj | 0.10 | 0.11 | 1.20 | A + B |
| 2014 · 07 · 15 CMACCanandaigua, NY | 11:25 | G maj | C maj | 0.10 | 0.13 | 1.42 | A + B |
| 2024 · 04 · 19 Spherebaseline | 14:18 | C maj | C maj | 0.00 | 0.06 | 1.09 | None |
| 2024 · 07 · 28 Alpine Valleybaseline | 12:03 | C maj | C maj | 0.00 | 0.05 | 0.82 | None |
| 2024 · 08 · 11 BethelBethel Woods · jamcharts (recent) | 18:49 | C maj | F maj | 0.43 | 0.11† | 1.57 | A + B + C |
† A note on the Bethel call. Bethel '24's whole-jam chord-vocabulary divergence averages to 0.11, below the 0.18 threshold the table is checking. I score it as Subtype B anyway because the score for individual thirty-second windows crosses 0.18 during the F-major passage in the back half. The whole-jam average smears that excursion out; the window-by-window view catches it. That is a measurement-method choice, not a free pass. It is the same choice that promoted Bethel from “A + C partial” on an earlier multi-metric scorecard to “A + B + C” here. A different reasonable threshold would leave it as A + C. I flag it because that kind of borderline call is exactly what the dispatch should surface, not bury.
A reference example is only useful if the detector tuned against it generalises. So once I had anchored A + B + C against UIC '98 (and cross-checked against Virginia Beach '98 and Bethel '24), I ran the same chain over recordings the community has not flagged. Two early candidates came back as triple-fires from outside the Bathtub Gin set. These are working candidates, not pronouncements. Both still need close listening review before I would call either a confirmed Type II. The reason to show them here is methodological: this is what the validation step looks like, with all the rough edges visible. Denominators matter, so: the Great Woods '24 Tweezer surfaced from a small set of seven baseline Tweezers where three of seven triple-fired, which is high enough that it could be the chain miscalibrating against 2024-era recordings rather than three real discoveries. The Goose IWD4U surfaced from a forty-eight-performance Goose 2025-2026 application where one performance fired all three, which is rare enough to be its own signal. Different evidence weight on the two candidates.
2024 · 07 · 21 to 21:48 minutes
Plus a tail-outlier score of 0.177, where most Tweezer tails in the reference set come in under 0.10. The score measures how unusual the last ninety seconds sound versus other Tweezer endings; higher means more unusual. That is the highest of the Tweezers I have looked at, including the 1995 Memphis fifty-minute Tweezer at 0.087. This is also the most chord-divergent Tweezer in the reference set, almost two standard deviations above the mean for the song. A candidate worth a closer listen, not a verdict.
2026 · 04 · 24 to 18:00 minutes · Prince cover
A four-minute Prince cover taken into eighteen minutes of A + B + C territory by a different band entirely. The case is on the bench for a future dispatch, once listener calibration is in hand.
A calibration note, not a textbook. Each subtype has its own anatomy. Each candidate match has its own minute-by-minute story. The reason to write in public, slowly, against a small bench of recordings, is to keep the reasoning visible while it is still being figured out. A short list of what is next on the bench:
I post when something is worth filing, not on a schedule. If you want to hear about new dispatches, send a note: zabriskieapp@gmail.com.