Skip to content

fix: extract YouTube source URLs from index [2][5]#283

Open
MCR-GLOBAL wants to merge 1 commit intoteng-lin:mainfrom
MCR-GLOBAL:fix/youtube-source-url-265
Open

fix: extract YouTube source URLs from index [2][5]#283
MCR-GLOBAL wants to merge 1 commit intoteng-lin:mainfrom
MCR-GLOBAL:fix/youtube-source-url-265

Conversation

@MCR-GLOBAL
Copy link
Copy Markdown

@MCR-GLOBAL MCR-GLOBAL commented Apr 15, 2026

Summary

Test plan

  • New unit test: test_from_api_response_youtube_source_url_at_index5 — verifies deeply nested YouTube URL extraction
  • New integration test: test_list_sources_youtube_url_at_index5 — verifies list() extracts YouTube URL from src[2][5]
  • Existing YouTube tests still pass (URL at [7] path unchanged)
  • Full test suite: 2015 passed, 9 skipped

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved source URL extraction to reliably detect and parse URLs from YouTube, web, PDF, and other source types
    • Enhanced handling of multiple API response data structures with intelligent fallback mechanisms for consistent URL retrieval
    • Strengthened URL detection logic to support various source format variations

YouTube sources store URL data at src[2][5] as [url, video_id, channel]
while src[2][7] is null. Previously only [2][7] was checked, causing
Source.url to always be None for YouTube sources.

Adds YouTube URL fallback in all three extraction paths: list(), and
both nested formats in Source.from_api_response().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 15, 2026

📝 Walkthrough

Walkthrough

This PR fixes YouTube source URL extraction in the NotebookLM API client by implementing multi-location fallback logic. The changes add checks for YouTube URLs at src[2][5][0] alongside existing checks at src[2][7][0] and src[2][0] in both the list() and from_api_response() methods, resolving an issue where YouTube source URLs were always None.

Changes

Cohort / File(s) Summary
URL Extraction Logic
src/notebooklm/_sources.py, src/notebooklm/types.py
Implemented multi-fallback URL extraction: attempts src[2][7][0] (web/PDF), then src[2][5][0] (YouTube), then src[2][0] (direct HTTP). Applied consistently across list() and from_api_response() methods in both shallow and deep-nested code paths.
Test Coverage
tests/integration/test_sources.py, tests/unit/test_types.py
Added integration and unit tests verifying YouTube URL extraction from the src[2][5][0] position, ensuring the fallback logic correctly populates Source.url for YouTube sources.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Through nested indices hopped a determined hare,
YouTube URLs hiding at index five, there!
With fallback logic crafted with care,
Now sources list bright, beyond compare,
A rabbit's fix, both tested and fair! 🎬

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: extract YouTube source URLs from index [2][5]' directly and accurately summarizes the main change: fixing URL extraction for YouTube sources by checking the [2][5] index.
Linked Issues check ✅ Passed All coding objectives from issue #265 are met: YouTube URL fallback added to src[2][5] in SourcesAPI.list() [283], Source.from_api_response() both nested paths [283], with comprehensive test coverage for both paths validating the fix.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #265: three code paths updated for YouTube URL extraction and two test cases added to validate the fix; no unrelated modifications present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the URL extraction logic for sources to accommodate different API response structures, specifically adding support for YouTube URLs and a fallback for HTTP links. Corresponding unit and integration tests have been included to verify the new extraction paths. I have no feedback to provide as there were no review comments to evaluate.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unit/test_types.py (1)

164-198: Add a medium-nested companion case for the same src[2][5] fallback.

This test covers the deeply nested branch well; adding one medium-nested case would guard both parsing branches in Source.from_api_response() against future regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_types.py` around lines 164 - 198, Add a medium-nested
companion test for the same src[2][5] fallback: create a new test (e.g.,
test_from_api_response_youtube_source_url_at_index5_medium_nesting) that builds
a slightly different nesting shape which still places the YouTube tuple at index
[2][5] of the source payload, call Source.from_api_response(...) and assert
source.id == "src_yt2", source.url ==
"https://www.youtube.com/watch?v=dcWU-qD8ISQ" and source.kind ==
SourceType.YOUTUBE; this ensures Source.from_api_response handles both the deep
and medium nesting branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unit/test_types.py`:
- Around line 164-198: Add a medium-nested companion test for the same src[2][5]
fallback: create a new test (e.g.,
test_from_api_response_youtube_source_url_at_index5_medium_nesting) that builds
a slightly different nesting shape which still places the YouTube tuple at index
[2][5] of the source payload, call Source.from_api_response(...) and assert
source.id == "src_yt2", source.url ==
"https://www.youtube.com/watch?v=dcWU-qD8ISQ" and source.kind ==
SourceType.YOUTUBE; this ensures Source.from_api_response handles both the deep
and medium nesting branches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 727bebb4-3f02-46c4-b779-1e6f46fdace3

📥 Commits

Reviewing files that changed from the base of the PR and between a997718 and 2f9cc7c.

📒 Files selected for processing (4)
  • src/notebooklm/_sources.py
  • src/notebooklm/types.py
  • tests/integration/test_sources.py
  • tests/unit/test_types.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Source.url is always None for YouTube sources in list() and from_api_response()

1 participant