OCR application for text recognition from images using Tesseract.
- Text recognition from images (PNG, JPG, JPEG, TIFF, BMP)
- Text recognition from PDF files
- Web interface for file upload
- Language selection for recognition (English, Russian)
- Export result (copy, download)
docker build -t ocr-tool .
docker run -p 8080:8080 ocr-toolmvn exec:java -D"exec.mainClass=com.github.dkrut.WebServer"- Select file (drag & drop or "Select File" button)
- Select language(s) - English, Russian
- Click "Start OCR"
- Copy or download result
Ocr- image processing via Tesseract (grayscale preprocessing)PdfConverter- PDF to PNG conversionWebServer- web interface (Javalin)
- File uploaded via web interface
- Temporary folder created:
TEMP/ocr-{uuid}/ - For PDF: converted to PNG (
pdf-images/) - Image converted to grayscale (
grayscale/) - Text recognized by Tesseract
- Temporary files deleted