onnxtr-ocr

The onnxtr-ocr tool performs optical character recognition (OCR) on images and PDFs, extracting text content using lightweight ONNX-based deep learning models optimized for CPU inference. It offers two quality presets to balance speed and accuracy, making it a versatile component for workflows that need to process visual content such as screenshots, scanned documents, and photos.

Ideal Use Cases & Fit

This tool excels in scenarios requiring text extraction from visual content within automated workflows. It is particularly effective when:

Extracting error messages or data from application screenshots
Processing scanned documents and PDF files for downstream analysis
Analyzing image-based intelligence sources such as phishing emails or web content captures
Converting visual data into structured text for use in logic nodes, AI nodes, or integration nodes

It is not recommended for real-time video stream processing or scenarios requiring advanced image preprocessing such as noise removal or rotation correction (though a straighten option is available for skewed images).

Best Practice: Use the fast quality preset for clean, large text (UI screenshots, printed documents). Switch to accurate for dense, small text or content with special characters (JSON payloads, monospace terminal output, code snippets).

Value in Workflows

In security workflows, onnxtr-ocr enables the automated processing of visual content that would otherwise require manual transcription. It can be positioned early in workflows to convert image-based findings into structured text, enabling downstream nodes to perform pattern matching, data extraction, or AI-driven analysis. This is particularly valuable for incident response workflows processing screenshot evidence, or OSINT workflows analyzing image-based intelligence.

Input Data

The tool accepts one or more image or PDF files, specified as comma-separated file paths. Supported formats include JPEG, PNG, TIFF, BMP, WEBP, and PDF.

format: Comma-separated file paths
function: target
required: true
example: /path/to/screenshot.png,/path/to/document.pdf

Configuration

quality: Controls the recognition quality preset. Two options are available:
- fast (default): Uses the crnn_mobilenet_v3_small model (~0.2s/page, 2.1M parameters). Best suited for clean UI text, large fonts, and printed documents where speed is prioritized.
- accurate: Uses the parseq transformer model (~6s/page, 23.8M parameters). Significantly better on small or dense text, special characters, monospace content, and mixed-format documents such as JSON payloads or terminal output.
straighten: Straightens pages before text recognition, useful for rotated or skewed images.

Output Data

The tool outputs a JSON file containing extracted text at multiple granularity levels:

full_text: All extracted text concatenated across pages
pages: Per-page breakdown with line-by-line text
blocks: Individual word-level results with confidence scores and bounding box geometry (normalized coordinates)
word_count: Total number of words detected

For multiple input files, the output is a JSON array with one result object per file. Updated: 2026-04-01

Ideal Use Cases & Fit​

Value in Workflows​

Input Data​

Configuration​

Output Data​

Ideal Use Cases & Fit

Value in Workflows

Input Data

Configuration

Output Data