Skip to main content

Apify Integration Guide

Overview

The Apify integration allows your NINA workflows to interact with the Apify platform for web scraping, automation, and data extraction tasks. Apify provides a cloud platform for running web scrapers and automation bots (called Actors), storing their output in Datasets and Key-Value Stores, and orchestrating runs via Tasks and Schedules.

Status

Supported resources and operations:

  • Actor: Get Actor, Start Actor, Call Actor (Blocking), List Actors
  • Run: Get Run, Abort Run, Wait For Run To Finish, List Runs
  • Dataset: Get Dataset, List Dataset Items, Push Items To Dataset, List Datasets
  • Key-Value Store: Get Record, Set Record, List Keys, List Key-Value Stores
  • Task: Get Task, Start Task, Call Task (Blocking), List Tasks
  • Schedule: Get Schedule, List Schedules, Create Schedule, Update Schedule, Delete Schedule

Advanced features:

  • Resource Locators: Select Actors, Tasks, and Datasets from dropdown lists or enter IDs directly
  • Blocking Calls: Call an Actor or Task and wait for it to finish within the same workflow step
  • Dataset Streaming: List and push structured data items to named Datasets
  • Key-Value Storage: Read and write arbitrary data by key in Key-Value Stores
  • Schedule Management: Create and manage cron-based schedules for automated Actor/Task runs

Credential Configuration

Authentication Method

API Token

FieldDescriptionExample
API TokenYour Apify personal API tokenapify_api_xxxxxxxxxxxxxxxx

How to Get Your Apify API Token

  1. Log in to your Apify Console
  2. Click on your avatar / profile in the top-right corner
  3. Select Settings from the dropdown menu
  4. Navigate to the Integrations tab
  5. Copy the Personal API token shown on the page

Note: Keep your API token secret. Anyone with this token can run Actors and access your data on Apify.

Creating an Apify Credential

  1. Navigate to the Credentials section in NINA
  2. Click Add New Credential
  3. Fill in:
    • Integration Service: "Apify"
    • Auth Type: "API Token"
    • API Token: Your Apify personal API token
  4. Click Save

NINA will verify the token by calling the /users/me endpoint before saving.

Supported Resources and Operations

Actor

Actors are cloud programs (web scrapers, automation bots, data processors) hosted on Apify.

OperationNameDescription
getGet ActorGet details of a specific Apify Actor
startStart ActorStart an Actor run asynchronously with optional input
callCall Actor (Blocking)Start an Actor and wait for it to finish
listList ActorsList your own Actors

Run

Runs are individual executions of an Actor.

OperationNameDescription
getGet RunGet details of a specific Actor run
abortAbort RunAbort a running Actor run (graceful or immediate)
waitForFinishWait For Run To FinishPoll a run until it reaches a terminal state
listList RunsList Actor runs

Dataset

Datasets store structured output data produced by Actor runs.

OperationNameDescription
getGet DatasetGet details of a specific dataset
listItemsList Dataset ItemsList items stored in a dataset
pushItemsPush Items To DatasetPush new items to a dataset
listList DatasetsList your named Datasets

Key-Value Store

Key-Value Stores hold arbitrary data (files, JSON blobs, etc.) indexed by a string key.

OperationNameDescription
getRecordGet RecordGet a record from a Key-Value Store by key
setRecordSet RecordSet (create or overwrite) a record in a Key-Value Store
listKeysList KeysList all keys in a Key-Value Store
listList Key-Value StoresList your named Key-Value Stores

Task

Tasks are saved configurations of an Actor — like bookmarked runs with a fixed input.

OperationNameDescription
getGet TaskGet details of a specific Task
startStart TaskStart a Task run asynchronously with optional input override
callCall Task (Blocking)Start a Task and wait for it to finish
listList TasksList your own Actor Tasks

Schedule

Schedules automatically trigger Actor or Task runs on a cron expression.

OperationNameDescription
getGet ScheduleGet details of a specific Schedule
listList SchedulesList all Schedules
createCreate ScheduleCreate a new Schedule
updateUpdate ScheduleUpdate an existing Schedule
deleteDelete ScheduleDelete a Schedule

Examples

Get Actor Details

{
"integration_service": "apify",
"resource": "actor",
"operation": "get",
"actorId": "apify~web-scraper"
}

Start an Actor Asynchronously

{
"integration_service": "apify",
"resource": "actor",
"operation": "start",
"actorId": "apify~web-scraper",
"input": {
"startUrls": [{"url": "https://example.com"}],
"maxCrawlPages": 10
},
"memory": 1024,
"timeout": 300,
"waitForFinish": 30
}

Note: waitForFinish (0–300 s) tells Apify to hold the HTTP response until the run finishes or the wait expires. Set to 0 to return immediately with the run ID.

Call an Actor and Wait for It to Finish

{
"integration_service": "apify",
"resource": "actor",
"operation": "call",
"actorId": "apify~web-scraper",
"input": {
"startUrls": [{"url": "https://example.com"}],
"maxCrawlPages": 5
},
"timeout": 120
}

Warning: call blocks the workflow step until the Actor finishes or times out. Use only for short-running Actors (under a few minutes) to avoid SQS visibility timeout issues.

Get a Specific Run

{
"integration_service": "apify",
"resource": "run",
"operation": "get",
"runId": "HG7ML7M8z78YcAPEB"
}

Wait For a Run to Finish

{
"integration_service": "apify",
"resource": "run",
"operation": "waitForFinish",
"runId": "HG7ML7M8z78YcAPEB",
"timeout": 300
}

Abort a Run Gracefully

{
"integration_service": "apify",
"resource": "run",
"operation": "abort",
"runId": "HG7ML7M8z78YcAPEB",
"gracefully": true
}

List Dataset Items

{
"integration_service": "apify",
"resource": "dataset",
"operation": "listItems",
"datasetId": "rHuMdwm6xCFt6WiGU",
"offset": 0,
"limit": 100,
"clean": true
}

Push Items to a Dataset

{
"integration_service": "apify",
"resource": "dataset",
"operation": "pushItems",
"datasetId": "rHuMdwm6xCFt6WiGU",
"items": [
{"url": "https://example.com", "title": "Example Domain"},
{"url": "https://example.org", "title": "Example Organisation"}
]
}

Get a Record from a Key-Value Store

{
"integration_service": "apify",
"resource": "keyValueStore",
"operation": "getRecord",
"storeId": "Mn4maDeTFKkVua4mz",
"key": "OUTPUT"
}

Set a Record in a Key-Value Store

{
"integration_service": "apify",
"resource": "keyValueStore",
"operation": "setRecord",
"storeId": "Mn4maDeTFKkVua4mz",
"key": "my-config",
"value": {"threshold": 0.9, "enabled": true},
"contentType": "application/json"
}

Start a Task with Input Override

{
"integration_service": "apify",
"resource": "task",
"operation": "start",
"taskId": "my-scraping-task",
"input": {
"maxCrawlPages": 20
}
}

Call a Task (Blocking)

{
"integration_service": "apify",
"resource": "task",
"operation": "call",
"taskId": "my-scraping-task",
"timeout": 60
}

Create a Schedule

{
"integration_service": "apify",
"resource": "schedule",
"operation": "create",
"name": "Daily Scrape",
"cronExpression": "0 8 * * *",
"isEnabled": true,
"actions": [
{
"type": "RUN_ACTOR_TASK",
"id": "my-scraping-task"
}
]
}

Delete a Schedule

{
"integration_service": "apify",
"resource": "schedule",
"operation": "delete",
"scheduleId": "Zs9XMpkFHjq5jB6yd"
}

Parameter Reference

Common Pagination Parameters

ParameterTypeDefaultMaxDescription
offsetnumber0Number of records to skip
limitnumber1001000Maximum number of records to return
descbooleanfalseSort in descending order (runs only)

Actor / Task Run Parameters

ParameterTypeDefaultRangeDescription
inputjsonInput data for the Actor or Task run
buildstringActor build tag or number (e.g. latest, 1.2.34)
timeoutnumber1200–3600Timeout in seconds
memorynumber128–32768Allocated memory in MB
waitForFinishnumber00–300Seconds to wait before returning the run ID

Actor ID Format

Apify Actor IDs use a tilde (~) as separator between username and actor name:

{username}~{actor-name}
# e.g. apify~web-scraper

You can also use just the numeric actorId returned by the API.

Run Status Reference

StatusDescription
READYWaiting to be assigned to a worker
RUNNINGCurrently executing
SUCCEEDEDCompleted successfully
FAILEDCompleted with an error
TIMED-OUTExceeded the timeout
ABORTEDManually stopped

Terminal states (run will not change): SUCCEEDED, FAILED, TIMED-OUT, ABORTED.

Best Practices

  1. Prefer async + waitForFinish over call: Use start with a short waitForFinish for most workflows. Reserve call only for Actors that reliably finish in under 2 minutes.

  2. Poll with waitForFinish: For longer Actors, use start to get a runId, then use run.waitForFinish in a subsequent step after a delay.

  3. Read output from Datasets: Actors typically write output to a default Dataset. After a run, use dataset.listItems with the default dataset ID from the run response.

  4. Use Tasks for reusable configurations: Save common Actor inputs as Tasks to avoid repeating large input blobs in your workflows.

  5. Filter Dataset items with clean: Set clean: true to skip empty items and hidden metadata fields when listing dataset items.

  6. Specify the build tag: Pin runs to a specific build tag (e.g. latest or a version number) to avoid unexpected behaviour from automatic Actor updates.

  7. Handle pagination: Default limit is 100. Use offset to page through large result sets.

  8. Respect Apify rate limits: Free accounts have API rate limits. Add delays between rapid consecutive calls in high-volume workflows.

Troubleshooting

IssueResolution
401 UnauthorizedVerify your API token is correct; regenerate it in Apify Console → Settings → Integrations
Actor run FAILEDCheck the Actor's log in Apify Console for the error cause; verify input matches the Actor's schema
Actor run TIMED-OUTIncrease timeout, reduce scope of work (fewer pages, smaller input), or use a more powerful memory tier
Dataset items emptyThe Actor may have written to a non-default dataset; check the run's defaultDatasetId field
actorId invalid charactersUse only alphanumeric, hyphens, underscores, and dots; separate username and actor name with ~
Key-Value Store record not foundConfirm storeId and key are correct; keys are case-sensitive
Schedule not triggeringVerify isEnabled: true and that the cronExpression is valid; check timezone settings in Apify Console
Blocking call times out in workflowThe Actor took longer than timeout; switch to async start + run.waitForFinish pattern

Workflow Context

This integration is particularly useful for:

  • Automated Web Scraping: Trigger scraping Actors from NINA workflows to collect data from websites on demand
  • Data Pipeline Orchestration: Chain Actor runs and pass output datasets downstream to other workflow nodes
  • Scheduled Data Collection: Create and manage Schedules to automatically run data collection on a recurring basis
  • ETL Workflows: Extract data via Actors, transform it in other nodes, and push results back to Datasets
  • Monitoring and Alerting: Periodically run Actors to check website changes and trigger alerts based on output
  • Competitive Intelligence: Automate market data collection and feed results into analytical workflows

Security Considerations

  1. Protect your API token: Store it only in NINA Credentials — never hardcode it in workflow parameters or logs.
  2. Scope token permissions: Apify personal tokens grant full account access. Use organisation-level tokens if available to limit blast radius.
  3. Review Actor input: Actors receive your input payload in plaintext. Avoid passing sensitive data (passwords, PII) unless the Actor is owned and audited by your team.
  4. Audit third-party Actors: Running a public Actor from the Apify Store gives it access to your Apify account context. Only use Actors from trusted publishers.
  5. Data residency: Datasets and Key-Value Stores are hosted on Apify's infrastructure. Avoid storing confidential data unless you are on a plan with data residency guarantees.

Additional Resources

Updated: 2026-04-07