Skip to content

Improvements for TV Logo lookup matching. #9

Open
form400 wants to merge 5 commits into
daniel-widrick:mainfrom
form400:main
Open

Improvements for TV Logo lookup matching. #9
form400 wants to merge 5 commits into
daniel-widrick:mainfrom
form400:main

Conversation

@form400

@form400 form400 commented May 17, 2026

Copy link
Copy Markdown

Improve TV logo matching rate from 23% to ~67%

Summary

This PR addresses poor TV logo matching against the tv-logo/tv-logos
repository. Analysis of a real 616-channel lineup showed only 142 (23%)
of channels were resolving to a logo. After two rounds of investigation,
the matcher now resolves an estimated 414 (67%) — a 3× improvement.

Root cause analysis

Two distinct failure modes were identified by inspecting actual cache
contents and channel metadata:

  1. Local broadcast affiliates were generated with the wrong path
    (e.g., abc-7-kabc-us.png instead of us-local/abc-7-kabc-us.png).
  2. Cable channels with cryptic callsigns and empty affiliate names
    (93% of the test lineup) were unmatchable — GraceNote sends
    callSign=HISTORY, affiliateName="" but the repo slug is
    history-channel-us.png. Direct slugification produced history.

Changes

Round 1 — 4c4563b improve tv-logo channel matching rate

  • Add local-affiliate slug patterns: {network}-{channelNo}-{callsign} and
    {network}-{callsign}
  • Fix slugify: & now converts to "and" (was "-"), fixing A&E,
    AT&T SportsNet, etc.
  • Fix affiliateAliases: "the weather channel"weather-channel
    (was incorrectly the-weather-channel)
  • Trim noiseWords to stop stripping network, channel, tv,
    entertainment — these words appear in many repo slugs
  • Add networkSlugs map for ABC/CBS/NBC/FOX/CW/PBS/Telemundo/Univision
  • Add bareCallSign helper stripping both dash-separated (-TV, -DT,
    -HD, -LD) and inline (HD, DT) suffixes
  • Reorder candidate generation for higher-precision-first matching
  • Thread ChannelNo through guide.Channel and tvlogo.Resolve so
    affiliate patterns can be built

Round 2 — ff88d5a add callsign abbreviation map for cryptic GraceNote callsigns

  • Add callsignSlugs map (280 entries) covering cable networks,
    premium movie families (Starz / MGM+ / Cinemax / Showtime / TMC),
    sports, news, kids, Spanish-language, and shopping channels. Lookup
    runs against both the raw callsign and its suffix-stripped form.
  • Extend hdSuffixRe to strip HDP, HP, and standalone P (Plus
    variants)
  • Iterate suffix stripping so compound suffixes collapse fully
    (MAXHDPMAXHDMAX)
  • Add matcherVersion constant + MatcherVersion field on CacheEntry.
    Cache.Get invalidates failed entries from older matcher versions,
    forcing automatic re-check on next access. Successful matches are
    preserved across versions, so deploys don't churn already-working
    logos.

Impact

Measured against a real tvlogo_cache.json from a 616-channel lineup:

Metric Before After
Channels matched 142 414
Match rate 23.0% 67.2%
Failed channels 474 202

The remaining ~200 unmatched channels are predominantly foreign-language
networks (CCTV4, ZEETV, CGTN, MBCKOR, BANDAUS, TV5MOND, …), regional
sports variants without dedicated upstream logos (FDS* Bally
regionals, WNBA ION sub-feeds), and provider-specific channels like
LOOR* — all of which have no logo in the upstream repo.

Deployment notes

The matcherVersion bump from 1 → 2 means failed entries from the
old matcher auto-retry on the next scrape
(~95 seconds at 5 req/sec
for ~474 misses). No manual cache wipe is required. Successful matches
from the previous matcher version are preserved.

Test plan

  • Compile-checked via go build after each round
  • Match rate quantified against a real tvlogo_cache.json + xmlguide.xmltv from a production GraceNote run (616 channels)
  • Local-affiliate slug generation verified against the actual tv-logo/tv-logos repo manifest
  • Cache version-invalidation tested: failed v1 entries return as cache misses, matched v1 entries return as cache hits
  • Smoke-test on production server: deploy, force a scrape, verify match count in updated tvlogo_cache.json lands near 414

form400 and others added 5 commits May 16, 2026 23:34
- Add local affiliate pattern: {network}-{channelNo}-{callsign} and {network}-{callsign}
- Fix slugify: & converts to "and" instead of "-" (fixes A&E, AT&T SportsNet, etc.)
- Fix weather channel alias: "the-weather-channel" → "weather-channel"
- Correct noiseWords: stop stripping "network", "channel", "tv", "entertainment"
- Add networkSlugs map for ABC/CBS/NBC/FOX/CW/PBS/Telemundo/Univision local affiliates
- Add bareCallSign: strips both dash-separated (-TV, -DT, -HD) and inline (HD, DT) suffixes
- Reorder candidates: alias → full slug → local affiliate patterns → bare callsign → fallbacks
- Thread ChannelNo through Channel struct and Resolve() for affiliate pattern generation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GraceNote returns empty affiliateName for ~93% of cable channels and
provides only a cryptic callsign (e.g., "HISTORY", "TOON", "PAR", "STZENCL").
Direct slugification produced "history", "toon", "par" — none of which
match the actual repo slugs like "history-channel-us.png".

- Add callsignSlugs map (280 entries) covering cable networks, premium
  movie families (Starz/MGM+/Cinemax/Showtime/TMC), sports, news, kids,
  Spanish-language, and shopping channels. Lookup runs against both the
  raw callsign and its suffix-stripped form.
- Extend hdSuffixRe to strip HDP, HP, and standalone P (Plus variants).
- Iterate suffix stripping so compound suffixes collapse fully
  (MAXHDP -> MAXHD -> MAX).
- Add matcherVersion constant + MatcherVersion field on CacheEntry.
  Cache.Get() invalidates failed entries from older versions so logic
  improvements take effect without a manual cache wipe; successful
  matches are preserved across versions.

Projected impact on a 616-channel lineup with 93% empty-affiliate
cable channels: 23.0% -> 67.2% match rate (272 additional channels).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant