Skip to content

feat: add Internet Culturale (ICCU) provider#159

Open
nikazzio wants to merge 1 commit intomainfrom
feat/iccu-internetculturale-provider
Open

feat: add Internet Culturale (ICCU) provider#159
nikazzio wants to merge 1 commit intomainfrom
feat/iccu-internetculturale-provider

Conversation

@nikazzio
Copy link
Copy Markdown
Owner

Adds full support for the ICCU aggregator (internetculturale.it) which covers
Biblioteca Medicea Laurenziana, Biblioteca Nazionale Marciana, BNCF, BNCR and
~50+ Italian institutions via the MAG/XML API.

Changes:

  • resolvers/mag_parser.py: MAG XML → IIIF v2 manifest converter
    • parse_mag_xml(): parse bibinfo, pages, build standard IIIF v2 manifest
    • IccuMetadata: structured metadata (library, city, sbn_code, shelfmark, oai_id, teca)
    • fetch_and_convert(): download + convert magparser endpoint
    • is_iccu_magparser_url(): intercept hook for downloader pipeline
    • Uses defusedxml for safe XML parsing
  • resolvers/internetculturale.py: resolver class
    • can_resolve(): matches internetculturale.it URLs and known OAI prefixes
    • get_manifest_url(): extracts OAI ID + teca, returns magparser URL
    • OAI prefix → teca lookup table (Marciana, BML, MagTeca-ICCU)
  • resolvers/search/internetculturale.py: HTML scraper for IC manuscript search
    • Targets /it/16/search?instance=magindice&channel__typeTipo=Manoscritto
    • Extracts OAI ID, teca, title, author, date, library, thumbnail per result
  • providers.py: register Internet Culturale provider (sort_order=5, first in UI)
  • logic/downloader.py: intercept magparser URLs before get_json() call
  • resolvers/discovery.py + search/init.py: wire search_internetculturale
  • tests/test_providers.py: update sort order assertion

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Adds full support for the ICCU aggregator (internetculturale.it) which covers
Biblioteca Medicea Laurenziana, Biblioteca Nazionale Marciana, BNCF, BNCR and
~50+ Italian institutions via the MAG/XML API.

Changes:
- resolvers/mag_parser.py: MAG XML → IIIF v2 manifest converter
  - parse_mag_xml(): parse bibinfo, pages, build standard IIIF v2 manifest
  - IccuMetadata: structured metadata (library, city, sbn_code, shelfmark, oai_id, teca)
  - fetch_and_convert(): download + convert magparser endpoint
  - is_iccu_magparser_url(): intercept hook for downloader pipeline
  - Uses defusedxml for safe XML parsing
- resolvers/internetculturale.py: resolver class
  - can_resolve(): matches internetculturale.it URLs and known OAI prefixes
  - get_manifest_url(): extracts OAI ID + teca, returns magparser URL
  - OAI prefix → teca lookup table (Marciana, BML, MagTeca-ICCU)
- resolvers/search/internetculturale.py: HTML scraper for IC manuscript search
  - Targets /it/16/search?instance=magindice&channel__typeTipo=Manoscritto
  - Extracts OAI ID, teca, title, author, date, library, thumbnail per result
- providers.py: register Internet Culturale provider (sort_order=5, first in UI)
- logic/downloader.py: intercept magparser URLs before get_json() call
- resolvers/discovery.py + search/__init__.py: wire search_internetculturale
- tests/test_providers.py: update sort order assertion

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant