onnxtr-ocr
The onnxtr-ocr tool performs optical character recognition (OCR) on images and PDFs, extracting text content using lightweight ONNX-based deep learning models optimized for CPU inference. It offers two quality presets to balance speed and accuracy, making it a versatile component for workflows that need to process visual content such as screenshots, scanned documents, and photos.
Ideal Use Cases & Fit
This tool excels in scenarios requiring text extraction from visual content within automated workflows. It is particularly effective when:
- Extracting error messages or data from application screenshots
- Processing scanned documents and PDF files for downstream analysis
- Analyzing image-based intelligence sources such as phishing emails or web content captures
- Converting visual data into structured text for use in logic nodes, AI nodes, or integration nodes
It is not recommended for real-time video stream processing or scenarios requiring advanced image preprocessing such as noise removal or rotation correction (though a straighten option is available for skewed images).
Best Practice: Use the fast quality preset for clean, large text (UI screenshots, printed documents). Switch to accurate for dense, small text or content with special characters (JSON payloads, monospace terminal output, code snippets).
Value in Workflows
In security workflows, onnxtr-ocr enables the automated processing of visual content that would otherwise require manual transcription. It can be positioned early in workflows to convert image-based findings into structured text, enabling downstream nodes to perform pattern matching, data extraction, or AI-driven analysis. This is particularly valuable for incident response workflows processing screenshot evidence, or OSINT workflows analyzing image-based intelligence.
Input Data
The tool accepts one or more image or PDF files, specified as comma-separated file paths. Supported formats include JPEG, PNG, TIFF, BMP, WEBP, and PDF.
- format: Comma-separated file paths
- function: target
- required: true
- example: /path/to/screenshot.png,/path/to/document.pdf
Configuration
- quality: Controls the recognition quality preset. Two options are available:
fast(default): Uses the crnn_mobilenet_v3_small model (~0.2s/page, 2.1M parameters). Best suited for clean UI text, large fonts, and printed documents where speed is prioritized.accurate: Uses the parseq transformer model (~6s/page, 23.8M parameters). Significantly better on small or dense text, special characters, monospace content, and mixed-format documents such as JSON payloads or terminal output.
- straighten: Straightens pages before text recognition, useful for rotated or skewed images.
Output Data
The tool outputs a JSON file containing extracted text at multiple granularity levels:
- full_text: All extracted text concatenated across pages
- pages: Per-page breakdown with line-by-line text
- blocks: Individual word-level results with confidence scores and bounding box geometry (normalized coordinates)
- word_count: Total number of words detected
For multiple input files, the output is a JSON array with one result object per file. Updated: 2026-04-01