Google Cloud Vision Integration Guide

Overview

The Google Cloud Vision integration allows your NINA workflows to extract text from images using Google's Cloud Vision OCR capabilities. This integration enables you to perform optical character recognition on images provided as base64 content, Google Cloud Storage URIs, or public URLs, directly from your workflows.

Status

At present, our integration supports key OCR functionalities, including:

Text Detection: Extract sparse text from images such as signs, labels, license plates, and short captions
Document Text Detection: Extract dense text from scanned documents, book pages, and receipts with full structural layout (pages, blocks, paragraphs, words, symbols)
Language Hints: Provide language hints to improve OCR accuracy for specific languages
Multiple Image Sources: Support for base64 encoded images, Google Cloud Storage URIs, and public image URLs

Credential Configuration

Before using the Google Cloud Vision integration in your workflows, you need to configure credentials for authentication. The NINA platform supports two authentication methods for Google Cloud Vision:

Authentication Methods

1. API Key Authentication

The simplest method is to use a Google Cloud Vision API key:

Field	Description	Example
API Key	Google Cloud Vision API key for authentication	`AIzaSyC4E1Pz...`
Auth Type	Authentication type	`apiKey`

How to get your API Key:

Go to the Google Cloud Console
Create a new project or select an existing one
Enable the Cloud Vision API for your project ("APIs & Services" > "Library" > search "Cloud Vision API")
Navigate to "APIs & Services" > "Credentials"
Click "Create Credentials" and select "API Key"
Copy the generated API key
(Optional) Restrict the API key to the Cloud Vision API only for security

2. OAuth2 Authentication

For more advanced scenarios or when you need enhanced security:

Field	Description	Example
Client ID	OAuth2 client ID	`1234567890-abc123def456.apps.googleusercontent.com`
Client Secret	OAuth2 client secret	`GOCSPX-abc123def456ghi789jkl`
Scope	OAuth2 scope for Google Cloud Vision	`https://www.googleapis.com/auth/cloud-vision`
Auth URL	Authorization URL	`https://accounts.google.com/o/oauth2/v2/auth`
Access Token URL	Access Token URL	`https://oauth2.googleapis.com/token`
Auth Type	Authentication type	`oauth2`

How to set up OAuth2:

Go to the Google Cloud Console
Navigate to "APIs & Services" > "Credentials"
Click "Create Credentials" and select "OAuth 2.0 Client IDs"
Configure the OAuth consent screen if prompted
Choose "Web application" as the application type
Add authorized redirect URIs:
- For POC environment: https://poc.zynap.com/api/v1/oauth2/callback
- For Production environment: https://platform.zynap.com/api/v1/oauth2/callback
Note your Client ID and Client Secret

Creating a Google Cloud Vision Credential

Navigate to the Credentials section in NINA
Click Add New Credential
Fill in the credential details:
- Name: A descriptive name (e.g., "Google Cloud Vision Production")
- Description: Optional details about the credential's purpose
- Integration Service: Select "Google Cloud Vision"
- Auth Type: Choose "apiKey" or "oauth2"
- Fill in the authentication fields based on your selected auth type
Click Test Connection to verify credentials
Click Save to store the credential

Supported Resources and Operations

The Google Cloud Vision integration supports the following resources and operations:

Image

Operation	Description
Detect Text	Detect and extract sparse text from an image (best for signs, labels, license plates, short captions)
Detect Document Text	Extract dense text from a document image with full structural layout (best for scanned documents, book pages, receipts)

Parameter Merging

The Google Cloud Vision integration takes full advantage of NINA's parameter merging capabilities:

Parameter Sources (in order of precedence)

Node Parameters: Parameters configured directly in the Google Cloud Vision Integration Node
Extracted Parameters: Parameters automatically extracted from the input data
Input Data: The complete input data from upstream nodes

When a Google Cloud Vision Integration Node executes:

It combines parameters from all sources
Node parameters take precedence over extracted parameters
The combined parameters are used to execute the OCR operation

Image Source Types

The integration supports three ways to provide images for OCR processing:

1. Base64 Encoded Image

Provide the image as a base64-encoded string (without the data:image/...;base64, prefix). This is ideal when the image comes from a previous workflow node (e.g., a file download or API response).

2. Google Cloud Storage URI

Reference an image stored in Google Cloud Storage using a gs:// URI (e.g., gs://my-bucket/images/document.jpg). This is best for images already stored in GCS and avoids transferring large payloads.

3. Public Image URL

Provide a publicly accessible HTTP/HTTPS URL to the image. Google's servers will fetch the image directly. The URL must be accessible from Google's infrastructure.

Examples

Detect Text from a Public Image URL

Extract text from a sign or label in an image:

{
  "resource": "image",
  "operation": "detectText",
  "parameters": {
    "imageSource": "url",
    "imageUrl": "https://example.com/images/store-sign.jpg"
  }
}

Response:

{
  "featureType": "TEXT_DETECTION",
  "text": "OPEN\nMon-Fri 9am-6pm",
  "detectedLanguage": "en",
  "annotations": [
    {
      "description": "OPEN",
      "boundingPoly": {
        "vertices": [
          {"x": 10, "y": 20},
          {"x": 80, "y": 20},
          {"x": 80, "y": 50},
          {"x": 10, "y": 50}
        ]
      }
    }
  ]
}

Detect Document Text from a Base64 Image

Extract structured text from a scanned document:

{
  "resource": "image",
  "operation": "detectDocumentText",
  "parameters": {
    "imageSource": "base64",
    "imageContent": "iVBORw0KGgoAAAANSUhEUgAA..."
  }
}

Response:

{
  "featureType": "DOCUMENT_TEXT_DETECTION",
  "text": "Invoice #12345\nDate: 2024-01-15\nAmount: $1,250.00\n...",
  "detectedLanguage": "en",
  "annotations": [...],
  "fullTextAnnotation": {
    "text": "Invoice #12345\nDate: 2024-01-15\nAmount: $1,250.00\n...",
    "pages": [
      {
        "width": 2480,
        "height": 3508,
        "blocks": [
          {
            "blockType": "TEXT",
            "boundingBox": {...},
            "paragraphs": [
              {
                "boundingBox": {...},
                "words": [
                  {
                    "boundingBox": {...},
                    "symbols": [
                      {"text": "I", "boundingBox": {...}},
                      {"text": "n", "boundingBox": {...}}
                    ]
                  }
                ]
              }
            ]
          }
        ]
      }
    ]
  }
}

Detect Text from Google Cloud Storage

Extract text from an image stored in GCS:

{
  "resource": "image",
  "operation": "detectText",
  "parameters": {
    "imageSource": "gcsUri",
    "gcsUri": "gs://my-bucket/images/receipt.jpg"
  }
}

Detect Text with Language Hints

Improve accuracy for specific languages by providing BCP-47 language hints:

{
  "resource": "image",
  "operation": "detectDocumentText",
  "parameters": {
    "imageSource": "url",
    "imageUrl": "https://example.com/images/japanese-menu.jpg",
    "languageHints": "ja,en"
  }
}

Response Structure

Detect Text (TEXT_DETECTION)

Field	Type	Description
`featureType`	string	Always `"TEXT_DETECTION"`
`text`	string	Full concatenated text extracted from the image
`detectedLanguage`	string	BCP-47 language code auto-detected from the text
`annotations`	array	Individual word/block annotations with bounding box coordinates

Detect Document Text (DOCUMENT_TEXT_DETECTION)

Returns all fields from Detect Text, plus:

Field	Type	Description
`fullTextAnnotation`	object	Structured layout with hierarchical text organization
`fullTextAnnotation.text`	string	Full extracted text
`fullTextAnnotation.pages`	array	Page-level data with dimensions
`fullTextAnnotation.pages[].blocks`	array	Text blocks with bounding boxes
`fullTextAnnotation.pages[].blocks[].paragraphs`	array	Paragraphs within blocks
`fullTextAnnotation.pages[].blocks[].paragraphs[].words`	array	Words within paragraphs
`fullTextAnnotation.pages[].blocks[].paragraphs[].words[].symbols`	array	Individual characters

Bounding Box Format

Each annotation includes a boundingPoly with vertex coordinates:

{
  "boundingPoly": {
    "vertices": [
      {"x": 0, "y": 0},
      {"x": 100, "y": 0},
      {"x": 100, "y": 50},
      {"x": 0, "y": 50}
    ]
  }
}

Vertices represent the four corners of the bounding rectangle (top-left, top-right, bottom-right, bottom-left).

Use Cases

Invoice and Receipt Processing

Extract text from scanned invoices for automated data entry:

{
  "resource": "image",
  "operation": "detectDocumentText",
  "parameters": {
    "imageSource": "base64",
    "imageContent": "<base64-encoded-invoice-image>"
  }
}

ID Document Verification

Extract text from identity documents for verification workflows:

{
  "resource": "image",
  "operation": "detectText",
  "parameters": {
    "imageSource": "base64",
    "imageContent": "<base64-encoded-id-image>",
    "languageHints": "en"
  }
}

Multilingual Document Processing

Process documents in multiple languages with language hints:

{
  "resource": "image",
  "operation": "detectDocumentText",
  "parameters": {
    "imageSource": "url",
    "imageUrl": "https://example.com/multilingual-document.png",
    "languageHints": "en,es,fr"
  }
}

Security Operations - Screenshot Analysis

Extract text from screenshots captured during incident investigations:

{
  "resource": "image",
  "operation": "detectDocumentText",
  "parameters": {
    "imageSource": "base64",
    "imageContent": "<base64-encoded-screenshot>"
  }
}

Choosing Between Detect Text and Detect Document Text

Criteria	Detect Text	Detect Document Text
Best for	Signs, labels, license plates, short text	Scanned documents, book pages, receipts
Text density	Sparse text	Dense text
Output	Flat text + word annotations	Structured layout (pages > blocks > paragraphs > words > symbols)
Bounding boxes	Word-level	Character-level
Use when	You need the raw text quickly	You need to understand document structure

Best Practices

Image Quality

Resolution: Use images of at least 1024x768 pixels for best results
Clarity: Ensure text is in focus and well-lit
Orientation: The API handles rotated text, but upright images produce better results
Contrast: High contrast between text and background improves accuracy

Performance Optimization

Choose the Right Feature: Use detectText for sparse text (faster) and detectDocumentText only when you need structural layout
Image Size: Keep images under 20 MB; resize large images before processing
Base64 vs URL: Use image URLs when possible to reduce payload size
GCS for Batch: If processing many images, upload them to GCS first and use gcsUri

Language Hints

Auto-detection works well: Leave languageHints empty for most cases
Use hints for ambiguous scripts: Helpful when the image contains text in scripts shared by multiple languages
Multiple hints: Provide multiple comma-separated hints when the image contains multilingual text

Troubleshooting

Common Issues

Issue	Resolution
Authentication failed (401/403)	Verify API key or OAuth credentials; ensure Cloud Vision API is enabled in your Google Cloud project
No text detected	Check image quality, resolution, and contrast; ensure the image contains readable text
Poor accuracy	Provide language hints; use higher resolution images; try `detectDocumentText` instead of `detectText`
Image too large	Resize the image to under 20 MB; reduce resolution while maintaining readability
Invalid base64	Ensure base64 string does not include the `data:image/...;base64,` prefix
GCS URI not accessible	Verify the service account or API key has read access to the GCS bucket
URL not accessible	Ensure the image URL is publicly accessible from Google's servers
Rate limit exceeded	Implement backoff strategy; Cloud Vision allows 1800 text detection requests per minute

Error Types

The integration returns categorized errors:

CREDENTIAL_ERROR: Invalid or expired credentials
INTEGRATION_ERROR: API-level errors (e.g., invalid image format, unsupported feature)
TIMEOUT_ERROR: Request timed out

Supported Image Formats

JPEG, PNG (8-bit and 24-bit), GIF, BMP, WEBP, RAW, ICO

Note: PDF and TIFF are supported only for the first page in synchronous mode. Multi-page document processing is planned for a future release.

Limits and Quotas

Limit	Value
Max image file size	20 MB
Max JSON request size	10 MB
Max image dimensions (OCR)	75 million pixels
Recommended resolution	1024x768 or higher
Rate limit (text detection)	1800 requests/minute

Pricing

$1.50 per 1,000 units (TEXT_DETECTION or DOCUMENT_TEXT_DETECTION)
First 1,000 units per month are free

Additional Resources

For additional support or questions about the Google Cloud Vision integration, consult the NINA documentation or contact your system administrator. Updated: 2026-04-01

Overview​

Status​

Credential Configuration​

Authentication Methods​

1. API Key Authentication​

2. OAuth2 Authentication​

Creating a Google Cloud Vision Credential​

Supported Resources and Operations​

Image​

Parameter Merging​

Parameter Sources (in order of precedence)​

Image Source Types​

1. Base64 Encoded Image​

2. Google Cloud Storage URI​

3. Public Image URL​

Examples​

Detect Text from a Public Image URL​

Detect Document Text from a Base64 Image​

Detect Text from Google Cloud Storage​

Detect Text with Language Hints​

Response Structure​

Detect Text (TEXT_DETECTION)​

Detect Document Text (DOCUMENT_TEXT_DETECTION)​

Bounding Box Format​

Use Cases​

Invoice and Receipt Processing​

ID Document Verification​

Multilingual Document Processing​

Security Operations - Screenshot Analysis​

Choosing Between Detect Text and Detect Document Text​

Best Practices​

Image Quality​

Performance Optimization​

Language Hints​

Troubleshooting​

Common Issues​

Error Types​

Supported Image Formats​

Limits and Quotas​

Pricing​

Additional Resources​