Skip to main content

Google Cloud Vision Integration Guide

Overview

The Google Cloud Vision integration allows your NINA workflows to extract text from images using Google's Cloud Vision OCR capabilities. This integration enables you to perform optical character recognition on images provided as base64 content, Google Cloud Storage URIs, or public URLs, directly from your workflows.

Status

At present, our integration supports key OCR functionalities, including:

  • Text Detection: Extract sparse text from images such as signs, labels, license plates, and short captions
  • Document Text Detection: Extract dense text from scanned documents, book pages, and receipts with full structural layout (pages, blocks, paragraphs, words, symbols)
  • Language Hints: Provide language hints to improve OCR accuracy for specific languages
  • Multiple Image Sources: Support for base64 encoded images, Google Cloud Storage URIs, and public image URLs

Credential Configuration

Before using the Google Cloud Vision integration in your workflows, you need to configure credentials for authentication. The NINA platform supports two authentication methods for Google Cloud Vision:

Authentication Methods

1. API Key Authentication

The simplest method is to use a Google Cloud Vision API key:

FieldDescriptionExample
API KeyGoogle Cloud Vision API key for authenticationAIzaSyC4E1Pz...
Auth TypeAuthentication typeapiKey

How to get your API Key:

  1. Go to the Google Cloud Console
  2. Create a new project or select an existing one
  3. Enable the Cloud Vision API for your project ("APIs & Services" > "Library" > search "Cloud Vision API")
  4. Navigate to "APIs & Services" > "Credentials"
  5. Click "Create Credentials" and select "API Key"
  6. Copy the generated API key
  7. (Optional) Restrict the API key to the Cloud Vision API only for security

2. OAuth2 Authentication

For more advanced scenarios or when you need enhanced security:

FieldDescriptionExample
Client IDOAuth2 client ID1234567890-abc123def456.apps.googleusercontent.com
Client SecretOAuth2 client secretGOCSPX-abc123def456ghi789jkl
ScopeOAuth2 scope for Google Cloud Visionhttps://www.googleapis.com/auth/cloud-vision
Auth URLAuthorization URLhttps://accounts.google.com/o/oauth2/v2/auth
Access Token URLAccess Token URLhttps://oauth2.googleapis.com/token
Auth TypeAuthentication typeoauth2

How to set up OAuth2:

  1. Go to the Google Cloud Console
  2. Navigate to "APIs & Services" > "Credentials"
  3. Click "Create Credentials" and select "OAuth 2.0 Client IDs"
  4. Configure the OAuth consent screen if prompted
  5. Choose "Web application" as the application type
  6. Add authorized redirect URIs:
    • For POC environment: https://poc.zynap.com/api/v1/oauth2/callback
    • For Production environment: https://platform.zynap.com/api/v1/oauth2/callback
  7. Note your Client ID and Client Secret

Creating a Google Cloud Vision Credential

  1. Navigate to the Credentials section in NINA
  2. Click Add New Credential
  3. Fill in the credential details:
    • Name: A descriptive name (e.g., "Google Cloud Vision Production")
    • Description: Optional details about the credential's purpose
    • Integration Service: Select "Google Cloud Vision"
    • Auth Type: Choose "apiKey" or "oauth2"
    • Fill in the authentication fields based on your selected auth type
  4. Click Test Connection to verify credentials
  5. Click Save to store the credential

Supported Resources and Operations

The Google Cloud Vision integration supports the following resources and operations:

Image

OperationDescription
Detect TextDetect and extract sparse text from an image (best for signs, labels, license plates, short captions)
Detect Document TextExtract dense text from a document image with full structural layout (best for scanned documents, book pages, receipts)

Parameter Merging

The Google Cloud Vision integration takes full advantage of NINA's parameter merging capabilities:

Parameter Sources (in order of precedence)

  1. Node Parameters: Parameters configured directly in the Google Cloud Vision Integration Node
  2. Extracted Parameters: Parameters automatically extracted from the input data
  3. Input Data: The complete input data from upstream nodes

When a Google Cloud Vision Integration Node executes:

  • It combines parameters from all sources
  • Node parameters take precedence over extracted parameters
  • The combined parameters are used to execute the OCR operation

Image Source Types

The integration supports three ways to provide images for OCR processing:

1. Base64 Encoded Image

Provide the image as a base64-encoded string (without the data:image/...;base64, prefix). This is ideal when the image comes from a previous workflow node (e.g., a file download or API response).

2. Google Cloud Storage URI

Reference an image stored in Google Cloud Storage using a gs:// URI (e.g., gs://my-bucket/images/document.jpg). This is best for images already stored in GCS and avoids transferring large payloads.

3. Public Image URL

Provide a publicly accessible HTTP/HTTPS URL to the image. Google's servers will fetch the image directly. The URL must be accessible from Google's infrastructure.

Examples

Detect Text from a Public Image URL

Extract text from a sign or label in an image:

{
"resource": "image",
"operation": "detectText",
"parameters": {
"imageSource": "url",
"imageUrl": "https://example.com/images/store-sign.jpg"
}
}

Response:

{
"featureType": "TEXT_DETECTION",
"text": "OPEN\nMon-Fri 9am-6pm",
"detectedLanguage": "en",
"annotations": [
{
"description": "OPEN",
"boundingPoly": {
"vertices": [
{"x": 10, "y": 20},
{"x": 80, "y": 20},
{"x": 80, "y": 50},
{"x": 10, "y": 50}
]
}
}
]
}

Detect Document Text from a Base64 Image

Extract structured text from a scanned document:

{
"resource": "image",
"operation": "detectDocumentText",
"parameters": {
"imageSource": "base64",
"imageContent": "iVBORw0KGgoAAAANSUhEUgAA..."
}
}

Response:

{
"featureType": "DOCUMENT_TEXT_DETECTION",
"text": "Invoice #12345\nDate: 2024-01-15\nAmount: $1,250.00\n...",
"detectedLanguage": "en",
"annotations": [...],
"fullTextAnnotation": {
"text": "Invoice #12345\nDate: 2024-01-15\nAmount: $1,250.00\n...",
"pages": [
{
"width": 2480,
"height": 3508,
"blocks": [
{
"blockType": "TEXT",
"boundingBox": {...},
"paragraphs": [
{
"boundingBox": {...},
"words": [
{
"boundingBox": {...},
"symbols": [
{"text": "I", "boundingBox": {...}},
{"text": "n", "boundingBox": {...}}
]
}
]
}
]
}
]
}
]
}
}

Detect Text from Google Cloud Storage

Extract text from an image stored in GCS:

{
"resource": "image",
"operation": "detectText",
"parameters": {
"imageSource": "gcsUri",
"gcsUri": "gs://my-bucket/images/receipt.jpg"
}
}

Detect Text with Language Hints

Improve accuracy for specific languages by providing BCP-47 language hints:

{
"resource": "image",
"operation": "detectDocumentText",
"parameters": {
"imageSource": "url",
"imageUrl": "https://example.com/images/japanese-menu.jpg",
"languageHints": "ja,en"
}
}

Response Structure

Detect Text (TEXT_DETECTION)

FieldTypeDescription
featureTypestringAlways "TEXT_DETECTION"
textstringFull concatenated text extracted from the image
detectedLanguagestringBCP-47 language code auto-detected from the text
annotationsarrayIndividual word/block annotations with bounding box coordinates

Detect Document Text (DOCUMENT_TEXT_DETECTION)

Returns all fields from Detect Text, plus:

FieldTypeDescription
fullTextAnnotationobjectStructured layout with hierarchical text organization
fullTextAnnotation.textstringFull extracted text
fullTextAnnotation.pagesarrayPage-level data with dimensions
fullTextAnnotation.pages[].blocksarrayText blocks with bounding boxes
fullTextAnnotation.pages[].blocks[].paragraphsarrayParagraphs within blocks
fullTextAnnotation.pages[].blocks[].paragraphs[].wordsarrayWords within paragraphs
fullTextAnnotation.pages[].blocks[].paragraphs[].words[].symbolsarrayIndividual characters

Bounding Box Format

Each annotation includes a boundingPoly with vertex coordinates:

{
"boundingPoly": {
"vertices": [
{"x": 0, "y": 0},
{"x": 100, "y": 0},
{"x": 100, "y": 50},
{"x": 0, "y": 50}
]
}
}

Vertices represent the four corners of the bounding rectangle (top-left, top-right, bottom-right, bottom-left).

Use Cases

Invoice and Receipt Processing

Extract text from scanned invoices for automated data entry:

{
"resource": "image",
"operation": "detectDocumentText",
"parameters": {
"imageSource": "base64",
"imageContent": "<base64-encoded-invoice-image>"
}
}

ID Document Verification

Extract text from identity documents for verification workflows:

{
"resource": "image",
"operation": "detectText",
"parameters": {
"imageSource": "base64",
"imageContent": "<base64-encoded-id-image>",
"languageHints": "en"
}
}

Multilingual Document Processing

Process documents in multiple languages with language hints:

{
"resource": "image",
"operation": "detectDocumentText",
"parameters": {
"imageSource": "url",
"imageUrl": "https://example.com/multilingual-document.png",
"languageHints": "en,es,fr"
}
}

Security Operations - Screenshot Analysis

Extract text from screenshots captured during incident investigations:

{
"resource": "image",
"operation": "detectDocumentText",
"parameters": {
"imageSource": "base64",
"imageContent": "<base64-encoded-screenshot>"
}
}

Choosing Between Detect Text and Detect Document Text

CriteriaDetect TextDetect Document Text
Best forSigns, labels, license plates, short textScanned documents, book pages, receipts
Text densitySparse textDense text
OutputFlat text + word annotationsStructured layout (pages > blocks > paragraphs > words > symbols)
Bounding boxesWord-levelCharacter-level
Use whenYou need the raw text quicklyYou need to understand document structure

Best Practices

Image Quality

  1. Resolution: Use images of at least 1024x768 pixels for best results
  2. Clarity: Ensure text is in focus and well-lit
  3. Orientation: The API handles rotated text, but upright images produce better results
  4. Contrast: High contrast between text and background improves accuracy

Performance Optimization

  1. Choose the Right Feature: Use detectText for sparse text (faster) and detectDocumentText only when you need structural layout
  2. Image Size: Keep images under 20 MB; resize large images before processing
  3. Base64 vs URL: Use image URLs when possible to reduce payload size
  4. GCS for Batch: If processing many images, upload them to GCS first and use gcsUri

Language Hints

  1. Auto-detection works well: Leave languageHints empty for most cases
  2. Use hints for ambiguous scripts: Helpful when the image contains text in scripts shared by multiple languages
  3. Multiple hints: Provide multiple comma-separated hints when the image contains multilingual text

Troubleshooting

Common Issues

IssueResolution
Authentication failed (401/403)Verify API key or OAuth credentials; ensure Cloud Vision API is enabled in your Google Cloud project
No text detectedCheck image quality, resolution, and contrast; ensure the image contains readable text
Poor accuracyProvide language hints; use higher resolution images; try detectDocumentText instead of detectText
Image too largeResize the image to under 20 MB; reduce resolution while maintaining readability
Invalid base64Ensure base64 string does not include the data:image/...;base64, prefix
GCS URI not accessibleVerify the service account or API key has read access to the GCS bucket
URL not accessibleEnsure the image URL is publicly accessible from Google's servers
Rate limit exceededImplement backoff strategy; Cloud Vision allows 1800 text detection requests per minute

Error Types

The integration returns categorized errors:

  • CREDENTIAL_ERROR: Invalid or expired credentials
  • INTEGRATION_ERROR: API-level errors (e.g., invalid image format, unsupported feature)
  • TIMEOUT_ERROR: Request timed out

Supported Image Formats

JPEG, PNG (8-bit and 24-bit), GIF, BMP, WEBP, RAW, ICO

Note: PDF and TIFF are supported only for the first page in synchronous mode. Multi-page document processing is planned for a future release.

Limits and Quotas

LimitValue
Max image file size20 MB
Max JSON request size10 MB
Max image dimensions (OCR)75 million pixels
Recommended resolution1024x768 or higher
Rate limit (text detection)1800 requests/minute

Pricing

  • $1.50 per 1,000 units (TEXT_DETECTION or DOCUMENT_TEXT_DETECTION)
  • First 1,000 units per month are free

Additional Resources

For additional support or questions about the Google Cloud Vision integration, consult the NINA documentation or contact your system administrator. Updated: 2026-04-01