Summary
Complete the transcription export pipeline. Several formats are already declared in EXPORT_FORMATS with available: False — flip them to True and implement the renderers.
Motivation
Researchers need to export their transcriptions in standard formats for publication, interoperability with other tools (Transkribus, eScriptorium, Kraken), and archival.
Formats to implement
Plain Text (.txt)
- One page per section, separated by
--- or page numbers.
- Straightforward concatenation of transcription content.
Markdown (.md)
- Page headings (
## Page N), transcription content, optional image references.
ALTO XML (.alto)
- Standard XML format for OCR output.
- Map transcription text to
<TextLine> / <String> elements.
- Include page dimensions from IIIF canvas metadata.
PAGE XML (.page)
- Alternative OCR interchange format used by Transkribus/eScriptorium.
- Similar structure to ALTO but different schema.
Acceptance criteria
Technical notes
Summary
Complete the transcription export pipeline. Several formats are already declared in
EXPORT_FORMATSwithavailable: False— flip them to True and implement the renderers.Motivation
Researchers need to export their transcriptions in standard formats for publication, interoperability with other tools (Transkribus, eScriptorium, Kraken), and archival.
Formats to implement
Plain Text (.txt)
---or page numbers.Markdown (.md)
## Page N), transcription content, optional image references.ALTO XML (.alto)
<TextLine>/<String>elements.PAGE XML (.page)
Acceptance criteria
Technical notes
EXPORT_FORMATSdict in export service for the existing stubs.