Google Cloud Vision Integration Guide
Overview
The Google Cloud Vision integration allows your NINA workflows to extract text from images using Google's Cloud Vision OCR capabilities. This integration enables you to perform optical character recognition on images provided as base64 content, Google Cloud Storage URIs, or public URLs, directly from your workflows.
Status
At present, our integration supports key OCR functionalities, including:
- Text Detection: Extract sparse text from images such as signs, labels, license plates, and short captions
- Document Text Detection: Extract dense text from scanned documents, book pages, and receipts with full structural layout (pages, blocks, paragraphs, words, symbols)
- Language Hints: Provide language hints to improve OCR accuracy for specific languages
- Multiple Image Sources: Support for base64 encoded images, Google Cloud Storage URIs, and public image URLs
Credential Configuration
Before using the Google Cloud Vision integration in your workflows, you need to configure credentials for authentication. The NINA platform supports two authentication methods for Google Cloud Vision:
Authentication Methods
1. API Key Authentication
The simplest method is to use a Google Cloud Vision API key:
| Field | Description | Example |
|---|---|---|
| API Key | Google Cloud Vision API key for authentication | AIzaSyC4E1Pz... |
| Auth Type | Authentication type | apiKey |
How to get your API Key:
- Go to the Google Cloud Console
- Create a new project or select an existing one
- Enable the Cloud Vision API for your project ("APIs & Services" > "Library" > search "Cloud Vision API")
- Navigate to "APIs & Services" > "Credentials"
- Click "Create Credentials" and select "API Key"
- Copy the generated API key
- (Optional) Restrict the API key to the Cloud Vision API only for security
2. OAuth2 Authentication
For more advanced scenarios or when you need enhanced security:
| Field | Description | Example |
|---|---|---|
| Client ID | OAuth2 client ID | 1234567890-abc123def456.apps.googleusercontent.com |
| Client Secret | OAuth2 client secret | GOCSPX-abc123def456ghi789jkl |
| Scope | OAuth2 scope for Google Cloud Vision | https://www.googleapis.com/auth/cloud-vision |
| Auth URL | Authorization URL | https://accounts.google.com/o/oauth2/v2/auth |
| Access Token URL | Access Token URL | https://oauth2.googleapis.com/token |
| Auth Type | Authentication type | oauth2 |
How to set up OAuth2:
- Go to the Google Cloud Console
- Navigate to "APIs & Services" > "Credentials"
- Click "Create Credentials" and select "OAuth 2.0 Client IDs"
- Configure the OAuth consent screen if prompted
- Choose "Web application" as the application type
- Add authorized redirect URIs:
- For POC environment:
https://poc.zynap.com/api/v1/oauth2/callback - For Production environment:
https://platform.zynap.com/api/v1/oauth2/callback
- For POC environment:
- Note your Client ID and Client Secret
Creating a Google Cloud Vision Credential
- Navigate to the Credentials section in NINA
- Click Add New Credential
- Fill in the credential details:
- Name: A descriptive name (e.g., "Google Cloud Vision Production")
- Description: Optional details about the credential's purpose
- Integration Service: Select "Google Cloud Vision"
- Auth Type: Choose "apiKey" or "oauth2"
- Fill in the authentication fields based on your selected auth type
- Click Test Connection to verify credentials
- Click Save to store the credential
Supported Resources and Operations
The Google Cloud Vision integration supports the following resources and operations:
Image
| Operation | Description |
|---|---|
| Detect Text | Detect and extract sparse text from an image (best for signs, labels, license plates, short captions) |
| Detect Document Text | Extract dense text from a document image with full structural layout (best for scanned documents, book pages, receipts) |
Parameter Merging
The Google Cloud Vision integration takes full advantage of NINA's parameter merging capabilities:
Parameter Sources (in order of precedence)
- Node Parameters: Parameters configured directly in the Google Cloud Vision Integration Node
- Extracted Parameters: Parameters automatically extracted from the input data
- Input Data: The complete input data from upstream nodes
When a Google Cloud Vision Integration Node executes:
- It combines parameters from all sources
- Node parameters take precedence over extracted parameters
- The combined parameters are used to execute the OCR operation
Image Source Types
The integration supports three ways to provide images for OCR processing:
1. Base64 Encoded Image
Provide the image as a base64-encoded string (without the data:image/...;base64, prefix). This is ideal when the image comes from a previous workflow node (e.g., a file download or API response).
2. Google Cloud Storage URI
Reference an image stored in Google Cloud Storage using a gs:// URI (e.g., gs://my-bucket/images/document.jpg). This is best for images already stored in GCS and avoids transferring large payloads.
3. Public Image URL
Provide a publicly accessible HTTP/HTTPS URL to the image. Google's servers will fetch the image directly. The URL must be accessible from Google's infrastructure.
Examples
Detect Text from a Public Image URL
Extract text from a sign or label in an image:
{
"resource": "image",
"operation": "detectText",
"parameters": {
"imageSource": "url",
"imageUrl": "https://example.com/images/store-sign.jpg"
}
}
Response:
{
"featureType": "TEXT_DETECTION",
"text": "OPEN\nMon-Fri 9am-6pm",
"detectedLanguage": "en",
"annotations": [
{
"description": "OPEN",
"boundingPoly": {
"vertices": [
{"x": 10, "y": 20},
{"x": 80, "y": 20},
{"x": 80, "y": 50},
{"x": 10, "y": 50}
]
}
}
]
}
Detect Document Text from a Base64 Image
Extract structured text from a scanned document:
{
"resource": "image",
"operation": "detectDocumentText",
"parameters": {
"imageSource": "base64",
"imageContent": "iVBORw0KGgoAAAANSUhEUgAA..."
}
}
Response:
{
"featureType": "DOCUMENT_TEXT_DETECTION",
"text": "Invoice #12345\nDate: 2024-01-15\nAmount: $1,250.00\n...",
"detectedLanguage": "en",
"annotations": [...],
"fullTextAnnotation": {
"text": "Invoice #12345\nDate: 2024-01-15\nAmount: $1,250.00\n...",
"pages": [
{
"width": 2480,
"height": 3508,
"blocks": [
{
"blockType": "TEXT",
"boundingBox": {...},
"paragraphs": [
{
"boundingBox": {...},
"words": [
{
"boundingBox": {...},
"symbols": [
{"text": "I", "boundingBox": {...}},
{"text": "n", "boundingBox": {...}}
]
}
]
}
]
}
]
}
]
}
}
Detect Text from Google Cloud Storage
Extract text from an image stored in GCS:
{
"resource": "image",
"operation": "detectText",
"parameters": {
"imageSource": "gcsUri",
"gcsUri": "gs://my-bucket/images/receipt.jpg"
}
}
Detect Text with Language Hints
Improve accuracy for specific languages by providing BCP-47 language hints:
{
"resource": "image",
"operation": "detectDocumentText",
"parameters": {
"imageSource": "url",
"imageUrl": "https://example.com/images/japanese-menu.jpg",
"languageHints": "ja,en"
}
}
Response Structure
Detect Text (TEXT_DETECTION)
| Field | Type | Description |
|---|---|---|
featureType | string | Always "TEXT_DETECTION" |
text | string | Full concatenated text extracted from the image |
detectedLanguage | string | BCP-47 language code auto-detected from the text |
annotations | array | Individual word/block annotations with bounding box coordinates |
Detect Document Text (DOCUMENT_TEXT_DETECTION)
Returns all fields from Detect Text, plus:
| Field | Type | Description |
|---|---|---|
fullTextAnnotation | object | Structured layout with hierarchical text organization |
fullTextAnnotation.text | string | Full extracted text |
fullTextAnnotation.pages | array | Page-level data with dimensions |
fullTextAnnotation.pages[].blocks | array | Text blocks with bounding boxes |
fullTextAnnotation.pages[].blocks[].paragraphs | array | Paragraphs within blocks |
fullTextAnnotation.pages[].blocks[].paragraphs[].words | array | Words within paragraphs |
fullTextAnnotation.pages[].blocks[].paragraphs[].words[].symbols | array | Individual characters |
Bounding Box Format
Each annotation includes a boundingPoly with vertex coordinates:
{
"boundingPoly": {
"vertices": [
{"x": 0, "y": 0},
{"x": 100, "y": 0},
{"x": 100, "y": 50},
{"x": 0, "y": 50}
]
}
}
Vertices represent the four corners of the bounding rectangle (top-left, top-right, bottom-right, bottom-left).
Use Cases
Invoice and Receipt Processing
Extract text from scanned invoices for automated data entry:
{
"resource": "image",
"operation": "detectDocumentText",
"parameters": {
"imageSource": "base64",
"imageContent": "<base64-encoded-invoice-image>"
}
}
ID Document Verification
Extract text from identity documents for verification workflows:
{
"resource": "image",
"operation": "detectText",
"parameters": {
"imageSource": "base64",
"imageContent": "<base64-encoded-id-image>",
"languageHints": "en"
}
}
Multilingual Document Processing
Process documents in multiple languages with language hints:
{
"resource": "image",
"operation": "detectDocumentText",
"parameters": {
"imageSource": "url",
"imageUrl": "https://example.com/multilingual-document.png",
"languageHints": "en,es,fr"
}
}
Security Operations - Screenshot Analysis
Extract text from screenshots captured during incident investigations:
{
"resource": "image",
"operation": "detectDocumentText",
"parameters": {
"imageSource": "base64",
"imageContent": "<base64-encoded-screenshot>"
}
}
Choosing Between Detect Text and Detect Document Text
| Criteria | Detect Text | Detect Document Text |
|---|---|---|
| Best for | Signs, labels, license plates, short text | Scanned documents, book pages, receipts |
| Text density | Sparse text | Dense text |
| Output | Flat text + word annotations | Structured layout (pages > blocks > paragraphs > words > symbols) |
| Bounding boxes | Word-level | Character-level |
| Use when | You need the raw text quickly | You need to understand document structure |
Best Practices
Image Quality
- Resolution: Use images of at least 1024x768 pixels for best results
- Clarity: Ensure text is in focus and well-lit
- Orientation: The API handles rotated text, but upright images produce better results
- Contrast: High contrast between text and background improves accuracy
Performance Optimization
- Choose the Right Feature: Use
detectTextfor sparse text (faster) anddetectDocumentTextonly when you need structural layout - Image Size: Keep images under 20 MB; resize large images before processing
- Base64 vs URL: Use image URLs when possible to reduce payload size
- GCS for Batch: If processing many images, upload them to GCS first and use
gcsUri
Language Hints
- Auto-detection works well: Leave
languageHintsempty for most cases - Use hints for ambiguous scripts: Helpful when the image contains text in scripts shared by multiple languages
- Multiple hints: Provide multiple comma-separated hints when the image contains multilingual text
Troubleshooting
Common Issues
| Issue | Resolution |
|---|---|
| Authentication failed (401/403) | Verify API key or OAuth credentials; ensure Cloud Vision API is enabled in your Google Cloud project |
| No text detected | Check image quality, resolution, and contrast; ensure the image contains readable text |
| Poor accuracy | Provide language hints; use higher resolution images; try detectDocumentText instead of detectText |
| Image too large | Resize the image to under 20 MB; reduce resolution while maintaining readability |
| Invalid base64 | Ensure base64 string does not include the data:image/...;base64, prefix |
| GCS URI not accessible | Verify the service account or API key has read access to the GCS bucket |
| URL not accessible | Ensure the image URL is publicly accessible from Google's servers |
| Rate limit exceeded | Implement backoff strategy; Cloud Vision allows 1800 text detection requests per minute |
Error Types
The integration returns categorized errors:
- CREDENTIAL_ERROR: Invalid or expired credentials
- INTEGRATION_ERROR: API-level errors (e.g., invalid image format, unsupported feature)
- TIMEOUT_ERROR: Request timed out
Supported Image Formats
JPEG, PNG (8-bit and 24-bit), GIF, BMP, WEBP, RAW, ICO
Note: PDF and TIFF are supported only for the first page in synchronous mode. Multi-page document processing is planned for a future release.
Limits and Quotas
| Limit | Value |
|---|---|
| Max image file size | 20 MB |
| Max JSON request size | 10 MB |
| Max image dimensions (OCR) | 75 million pixels |
| Recommended resolution | 1024x768 or higher |
| Rate limit (text detection) | 1800 requests/minute |
Pricing
- $1.50 per 1,000 units (TEXT_DETECTION or DOCUMENT_TEXT_DETECTION)
- First 1,000 units per month are free
Additional Resources
- Google Cloud Vision API Documentation
- OCR Guide (Text Detection)
- Cloud Vision API Pricing
- Supported Languages
- API Quotas and Limits
For additional support or questions about the Google Cloud Vision integration, consult the NINA documentation or contact your system administrator. Updated: 2026-04-01