| title | Coptic Translation Interface |
|---|---|
| emoji | 🔮 |
| colorFrom | green |
| colorTo | indigo |
| sdk | docker |
| app_port | 7860 |
| pinned | false |
| license | apache-2.0 |
| short_description | Coptic↔English translation + neural-symbolic parser |
A comprehensive research tool for Coptic language analysis combining neural machine translation with neural-symbolic dependency parsing.
This interface provides three integrated tools for Coptic language research:
- Neural Machine Translation (Coptic ↔ English)
- Neural-Symbolic Dependency Parser (Stanza + Prolog)
- Grammatical Validation (Walter Till's grammar + Crum's lexicon)
- Coptic → English: megalaa/coptic-english-translator
- English → Coptic: megalaa/english-coptic-translator
- Dialects: Sahidic (literary standard) and Bohairic (liturgical)
- Virtual Keyboard: 31 Coptic Unicode characters
- Example Corpus: Simple sentences, complex structures, and full texts
- Models: Fine-tuned MarianMT on 50,000+ CopticScriptorium parallel sentences
-
Neural Layer:
-
Symbolic Layer:
- Prolog implementation of Walter Till's Coptic Grammar (1955)
- Integration with Crum's Coptic Dictionary (1939)
- Grammatical pattern detection (tripartite sentences, etc.)
- Error detection for neural parser hallucinations
-
Export: CoNLL-U format for corpus linguistics research
- Detects grammatical patterns from Walter Till's grammar
- Validates dependency structures against linguistic rules
- Identifies common parsing errors and hallucinations
- Provides grammatical warnings and suggestions
- Python 3.10+
- SWI-Prolog 8.0+ (for Prolog validation)
- Docker (for containerized deployment)
# Clone the repository
git clone https://github.com/Rogaton/coptic-translation-interface.git
cd coptic-translation-interface
# Install dependencies
pip install -r requirements.txt
# Download Stanza Coptic models
python -c "import stanza; stanza.download('cop')"
# Run the interface
python app.py# Build the Docker image
docker build -t coptic-interface .
# Run the container
docker run -p 7860:7860 coptic-interfaceAccess the interface at http://localhost:7860 with three tabs:
- Coptic → English: Translate Coptic text to English
- English → Coptic: Translate English text to Coptic
- Dependency Analysis: Parse Coptic text with neural-symbolic validation
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load model
tokenizer = AutoTokenizer.from_pretrained("megalaa/coptic-english-translator")
model = AutoModelForSeq2SeqLM.from_pretrained("megalaa/coptic-english-translator")
# Translate
coptic_text = "ⲡϫⲟⲉⲓⲥ ⲡⲉ ⲡⲁⲛⲟⲩⲧⲉ"
inputs = tokenizer(coptic_text, return_tensors="pt")
outputs = model.generate(**inputs)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Output: "The Lord is my God"from coptic_parser_core import CopticParserCore
# Initialize parser
parser = CopticParserCore()
parser.load_parser()
# Parse text
result = parser.parse_text("ⲁⲛⲟⲕ ⲡⲉ ⲡⲛⲟⲩⲧⲉ")
# Export to CoNLL-U
conllu = parser.format_conllu(result)Input Text
↓
[Neural Layer]
↓
Stanza Pipeline → Tokenization, POS, Lemmatization
↓
DiaParser → Dependency Trees
↓
[Symbolic Layer]
↓
Prolog Rules → Grammatical Validation
↓
Till Grammar + Crum Lexicon → Error Detection
↓
Output: Validated Parse + Warnings
Both translation models use:
- Architecture: MarianMT (Seq2Seq Transformer)
- Training Data: CopticScriptorium parallel corpus (50,000+ sentences)
- Preprocessing: Coptic Unicode → Greek transcription
- Dialect Tags: Cyrillic markers (з for Sahidic, б for Bohairic)
The interface includes coptic_test_corpus.json with:
- Simple Sentences: 10+ examples (Sahidic & Bohairic)
- Complex Sentences: 5+ examples with subordination
- Full Texts: Biblical narratives and parables
- Grammar Patterns: Tripartite nominals, perfect tense, etc.
- Walter Till's Grammar (
coptic_grammar.pl): 700+ Prolog rules - Crum's Dictionary (
coptic_lexicon.pl): 12,000+ lexical entries - Stanza Models: Pre-trained on CopticScriptorium Universal Dependencies
- Corpus Linguistics: CoNLL-U export for quantitative analysis
- Digital Humanities: Automated parsing of Coptic manuscripts
- Language Learning: Interactive translation with grammatical feedback
- Computational Linguistics: Neural-symbolic architecture research
- Egyptology: Analysis of Coptic Biblical and documentary texts
- Translation Quality: BLEU score ~35-40 (Coptic→English)
- Parsing Accuracy: UAS ~85%, LAS ~80% on CopticScriptorium test set
- Prolog Validation: Detects 70%+ of common parsing errors
- Inference Speed: ~0.5s per sentence (translation), ~2s (parsing with validation)
If you use this interface in your research, please cite:
@software{linden2025coptic,
author = {Linden, André},
title = {Coptic Translation and Parsing Interface: A Neural-Symbolic Approach},
year = {2025},
publisher = {Zenodo},
doi = {10.5281/zenodo.19487216},
url = {https://doi.org/10.5281/zenodo.19487216},
version = {1.0.1}
}- Enis, M. & Megalaa, A. (2024). Ancient voices, modern technology: Low-resource neural machine translation for coptic texts. [Paper Link]
- Till, W. C. (1955). Koptische Grammatik (Saïdischer Dialekt). Leipzig: VEB Verlag Enzyklopädie.
- Crum, W. E. (1939). A Coptic Dictionary. Oxford: Clarendon Press.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Translation Models: megalaa models (various licenses)
- Stanza: Apache License 2.0
- Prolog Rules: CC BY-NC-SA 4.0 (based on Till's grammar)
- Lexicon: Public domain (Crum's dictionary)
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Commit your changes (
git commit -m 'Add new feature') - Push to the branch (
git push origin feature/improvement) - Open a Pull Request
- Additional dialect support (Akhmimic, Lycopolitan, Fayyumic)
- Improved Prolog rules for complex constructions
- Enhanced error detection algorithms
- Extended test corpus with documentary texts
- Performance optimizations
- CopticScriptorium for the parallel corpus and UD annotations
- Amir Zeldes and team for Coptic NLP resources
- Stanford NLP Group for Stanza
- megalaa for the translation models
- Walter Till for the foundational Coptic grammar
- W. E. Crum for the comprehensive dictionary
- Live Demo: HuggingFace Space
- Translation Models: megalaa/coptic-english-translator, megalaa/english-coptic-translator
- Dependency Parser: coptic-dependency-parser
- GitHub Profile: @Rogaton
André Linden
- Email: linden@bluewin.ch
- GitHub: @Rogaton
- v1.0.1 (2025-04-09): Zenodo archive release
- DOI: 10.5281/zenodo.19487216
- Updated documentation with contact information
- All features from v1.0.0
- v1.0.0 (2025-04-09): Initial release
- Neural machine translation (Coptic ↔ English)
- Neural-symbolic dependency parser
- Prolog grammatical validation
- Web interface with examples
Built with ❤️ for Coptic language research and digital humanities