Improvements for TV Logo lookup matching. #9
Open
form400 wants to merge 5 commits into
Open
Conversation
- Add local affiliate pattern: {network}-{channelNo}-{callsign} and {network}-{callsign}
- Fix slugify: & converts to "and" instead of "-" (fixes A&E, AT&T SportsNet, etc.)
- Fix weather channel alias: "the-weather-channel" → "weather-channel"
- Correct noiseWords: stop stripping "network", "channel", "tv", "entertainment"
- Add networkSlugs map for ABC/CBS/NBC/FOX/CW/PBS/Telemundo/Univision local affiliates
- Add bareCallSign: strips both dash-separated (-TV, -DT, -HD) and inline (HD, DT) suffixes
- Reorder candidates: alias → full slug → local affiliate patterns → bare callsign → fallbacks
- Thread ChannelNo through Channel struct and Resolve() for affiliate pattern generation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GraceNote returns empty affiliateName for ~93% of cable channels and provides only a cryptic callsign (e.g., "HISTORY", "TOON", "PAR", "STZENCL"). Direct slugification produced "history", "toon", "par" — none of which match the actual repo slugs like "history-channel-us.png". - Add callsignSlugs map (280 entries) covering cable networks, premium movie families (Starz/MGM+/Cinemax/Showtime/TMC), sports, news, kids, Spanish-language, and shopping channels. Lookup runs against both the raw callsign and its suffix-stripped form. - Extend hdSuffixRe to strip HDP, HP, and standalone P (Plus variants). - Iterate suffix stripping so compound suffixes collapse fully (MAXHDP -> MAXHD -> MAX). - Add matcherVersion constant + MatcherVersion field on CacheEntry. Cache.Get() invalidates failed entries from older versions so logic improvements take effect without a manual cache wipe; successful matches are preserved across versions. Projected impact on a 616-channel lineup with 93% empty-affiliate cable channels: 23.0% -> 67.2% match rate (272 additional channels). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Improve TV logo matching rate from 23% to ~67%
Summary
This PR addresses poor TV logo matching against the
tv-logo/tv-logosrepository. Analysis of a real 616-channel lineup showed only 142 (23%)
of channels were resolving to a logo. After two rounds of investigation,
the matcher now resolves an estimated 414 (67%) — a 3× improvement.
Root cause analysis
Two distinct failure modes were identified by inspecting actual cache
contents and channel metadata:
(e.g.,
abc-7-kabc-us.pnginstead ofus-local/abc-7-kabc-us.png).(93% of the test lineup) were unmatchable — GraceNote sends
callSign=HISTORY, affiliateName=""but the repo slug ishistory-channel-us.png. Direct slugification producedhistory.Changes
Round 1 —
4c4563bimprove tv-logo channel matching rate{network}-{channelNo}-{callsign}and{network}-{callsign}slugify:&now converts to"and"(was"-"), fixing A&E,AT&T SportsNet, etc.
affiliateAliases:"the weather channel"→weather-channel(was incorrectly
the-weather-channel)noiseWordsto stop strippingnetwork,channel,tv,entertainment— these words appear in many repo slugsnetworkSlugsmap for ABC/CBS/NBC/FOX/CW/PBS/Telemundo/UnivisionbareCallSignhelper stripping both dash-separated (-TV,-DT,-HD,-LD) and inline (HD,DT) suffixesChannelNothroughguide.Channelandtvlogo.Resolvesoaffiliate patterns can be built
Round 2 —
ff88d5aadd callsign abbreviation map for cryptic GraceNote callsignscallsignSlugsmap (280 entries) covering cable networks,premium movie families (Starz / MGM+ / Cinemax / Showtime / TMC),
sports, news, kids, Spanish-language, and shopping channels. Lookup
runs against both the raw callsign and its suffix-stripped form.
hdSuffixReto stripHDP,HP, and standaloneP(Plusvariants)
(
MAXHDP→MAXHD→MAX)matcherVersionconstant +MatcherVersionfield onCacheEntry.Cache.Getinvalidates failed entries from older matcher versions,forcing automatic re-check on next access. Successful matches are
preserved across versions, so deploys don't churn already-working
logos.
Impact
Measured against a real
tvlogo_cache.jsonfrom a 616-channel lineup:The remaining ~200 unmatched channels are predominantly foreign-language
networks (CCTV4, ZEETV, CGTN, MBCKOR, BANDAUS, TV5MOND, …), regional
sports variants without dedicated upstream logos (
FDS*Ballyregionals, WNBA ION sub-feeds), and provider-specific channels like
LOOR* — all of which have no logo in the upstream repo.
Deployment notes
The
matcherVersionbump from 1 → 2 means failed entries from theold matcher auto-retry on the next scrape (~95 seconds at 5 req/sec
for ~474 misses). No manual cache wipe is required. Successful matches
from the previous matcher version are preserved.
Test plan
go buildafter each roundtvlogo_cache.json+xmlguide.xmltvfrom a production GraceNote run (616 channels)tv-logo/tv-logosrepo manifesttvlogo_cache.jsonlands near 414