Apify Integration Guide
Overview
The Apify integration allows your NINA workflows to interact with the Apify platform for web scraping, automation, and data extraction tasks. Apify provides a cloud platform for running web scrapers and automation bots (called Actors), storing their output in Datasets and Key-Value Stores, and orchestrating runs via Tasks and Schedules.
Status
Supported resources and operations:
- Actor: Get Actor, Start Actor, Call Actor (Blocking), List Actors
- Run: Get Run, Abort Run, Wait For Run To Finish, List Runs
- Dataset: Get Dataset, List Dataset Items, Push Items To Dataset, List Datasets
- Key-Value Store: Get Record, Set Record, List Keys, List Key-Value Stores
- Task: Get Task, Start Task, Call Task (Blocking), List Tasks
- Schedule: Get Schedule, List Schedules, Create Schedule, Update Schedule, Delete Schedule
Advanced features:
- Resource Locators: Select Actors, Tasks, and Datasets from dropdown lists or enter IDs directly
- Blocking Calls: Call an Actor or Task and wait for it to finish within the same workflow step
- Dataset Streaming: List and push structured data items to named Datasets
- Key-Value Storage: Read and write arbitrary data by key in Key-Value Stores
- Schedule Management: Create and manage cron-based schedules for automated Actor/Task runs
Credential Configuration
Authentication Method
API Token
| Field | Description | Example |
|---|---|---|
| API Token | Your Apify personal API token | apify_api_xxxxxxxxxxxxxxxx |
How to Get Your Apify API Token
- Log in to your Apify Console
- Click on your avatar / profile in the top-right corner
- Select Settings from the dropdown menu
- Navigate to the Integrations tab
- Copy the Personal API token shown on the page
Note: Keep your API token secret. Anyone with this token can run Actors and access your data on Apify.
Creating an Apify Credential
- Navigate to the Credentials section in NINA
- Click Add New Credential
- Fill in:
- Integration Service: "Apify"
- Auth Type: "API Token"
- API Token: Your Apify personal API token
- Click Save
NINA will verify the token by calling the /users/me endpoint before saving.
Supported Resources and Operations
Actor
Actors are cloud programs (web scrapers, automation bots, data processors) hosted on Apify.
| Operation | Name | Description |
|---|---|---|
| get | Get Actor | Get details of a specific Apify Actor |
| start | Start Actor | Start an Actor run asynchronously with optional input |
| call | Call Actor (Blocking) | Start an Actor and wait for it to finish |
| list | List Actors | List your own Actors |
Run
Runs are individual executions of an Actor.
| Operation | Name | Description |
|---|---|---|
| get | Get Run | Get details of a specific Actor run |
| abort | Abort Run | Abort a running Actor run (graceful or immediate) |
| waitForFinish | Wait For Run To Finish | Poll a run until it reaches a terminal state |
| list | List Runs | List Actor runs |
Dataset
Datasets store structured output data produced by Actor runs.
| Operation | Name | Description |
|---|---|---|
| get | Get Dataset | Get details of a specific dataset |
| listItems | List Dataset Items | List items stored in a dataset |
| pushItems | Push Items To Dataset | Push new items to a dataset |
| list | List Datasets | List your named Datasets |
Key-Value Store
Key-Value Stores hold arbitrary data (files, JSON blobs, etc.) indexed by a string key.
| Operation | Name | Description |
|---|---|---|
| getRecord | Get Record | Get a record from a Key-Value Store by key |
| setRecord | Set Record | Set (create or overwrite) a record in a Key-Value Store |
| listKeys | List Keys | List all keys in a Key-Value Store |
| list | List Key-Value Stores | List your named Key-Value Stores |
Task
Tasks are saved configurations of an Actor — like bookmarked runs with a fixed input.
| Operation | Name | Description |
|---|---|---|
| get | Get Task | Get details of a specific Task |
| start | Start Task | Start a Task run asynchronously with optional input override |
| call | Call Task (Blocking) | Start a Task and wait for it to finish |
| list | List Tasks | List your own Actor Tasks |
Schedule
Schedules automatically trigger Actor or Task runs on a cron expression.
| Operation | Name | Description |
|---|---|---|
| get | Get Schedule | Get details of a specific Schedule |
| list | List Schedules | List all Schedules |
| create | Create Schedule | Create a new Schedule |
| update | Update Schedule | Update an existing Schedule |
| delete | Delete Schedule | Delete a Schedule |
Examples
Get Actor Details
{
"integration_service": "apify",
"resource": "actor",
"operation": "get",
"actorId": "apify~web-scraper"
}
Start an Actor Asynchronously
{
"integration_service": "apify",
"resource": "actor",
"operation": "start",
"actorId": "apify~web-scraper",
"input": {
"startUrls": [{"url": "https://example.com"}],
"maxCrawlPages": 10
},
"memory": 1024,
"timeout": 300,
"waitForFinish": 30
}
Note: waitForFinish (0–300 s) tells Apify to hold the HTTP response until the run finishes or the wait expires. Set to 0 to return immediately with the run ID.
Call an Actor and Wait for It to Finish
{
"integration_service": "apify",
"resource": "actor",
"operation": "call",
"actorId": "apify~web-scraper",
"input": {
"startUrls": [{"url": "https://example.com"}],
"maxCrawlPages": 5
},
"timeout": 120
}
Warning:
callblocks the workflow step until the Actor finishes or times out. Use only for short-running Actors (under a few minutes) to avoid SQS visibility timeout issues.
Get a Specific Run
{
"integration_service": "apify",
"resource": "run",
"operation": "get",
"runId": "HG7ML7M8z78YcAPEB"
}
Wait For a Run to Finish
{
"integration_service": "apify",
"resource": "run",
"operation": "waitForFinish",
"runId": "HG7ML7M8z78YcAPEB",
"timeout": 300
}
Abort a Run Gracefully
{
"integration_service": "apify",
"resource": "run",
"operation": "abort",
"runId": "HG7ML7M8z78YcAPEB",
"gracefully": true
}
List Dataset Items
{
"integration_service": "apify",
"resource": "dataset",
"operation": "listItems",
"datasetId": "rHuMdwm6xCFt6WiGU",
"offset": 0,
"limit": 100,
"clean": true
}
Push Items to a Dataset
{
"integration_service": "apify",
"resource": "dataset",
"operation": "pushItems",
"datasetId": "rHuMdwm6xCFt6WiGU",
"items": [
{"url": "https://example.com", "title": "Example Domain"},
{"url": "https://example.org", "title": "Example Organisation"}
]
}
Get a Record from a Key-Value Store
{
"integration_service": "apify",
"resource": "keyValueStore",
"operation": "getRecord",
"storeId": "Mn4maDeTFKkVua4mz",
"key": "OUTPUT"
}
Set a Record in a Key-Value Store
{
"integration_service": "apify",
"resource": "keyValueStore",
"operation": "setRecord",
"storeId": "Mn4maDeTFKkVua4mz",
"key": "my-config",
"value": {"threshold": 0.9, "enabled": true},
"contentType": "application/json"
}
Start a Task with Input Override
{
"integration_service": "apify",
"resource": "task",
"operation": "start",
"taskId": "my-scraping-task",
"input": {
"maxCrawlPages": 20
}
}
Call a Task (Blocking)
{
"integration_service": "apify",
"resource": "task",
"operation": "call",
"taskId": "my-scraping-task",
"timeout": 60
}
Create a Schedule
{
"integration_service": "apify",
"resource": "schedule",
"operation": "create",
"name": "Daily Scrape",
"cronExpression": "0 8 * * *",
"isEnabled": true,
"actions": [
{
"type": "RUN_ACTOR_TASK",
"id": "my-scraping-task"
}
]
}
Delete a Schedule
{
"integration_service": "apify",
"resource": "schedule",
"operation": "delete",
"scheduleId": "Zs9XMpkFHjq5jB6yd"
}
Parameter Reference
Common Pagination Parameters
| Parameter | Type | Default | Max | Description |
|---|---|---|---|---|
| offset | number | 0 | — | Number of records to skip |
| limit | number | 100 | 1000 | Maximum number of records to return |
| desc | boolean | false | — | Sort in descending order (runs only) |
Actor / Task Run Parameters
| Parameter | Type | Default | Range | Description |
|---|---|---|---|---|
| input | json | — | — | Input data for the Actor or Task run |
| build | string | — | — | Actor build tag or number (e.g. latest, 1.2.34) |
| timeout | number | 120 | 0–3600 | Timeout in seconds |
| memory | number | — | 128–32768 | Allocated memory in MB |
| waitForFinish | number | 0 | 0–300 | Seconds to wait before returning the run ID |
Actor ID Format
Apify Actor IDs use a tilde (~) as separator between username and actor name:
{username}~{actor-name}
# e.g. apify~web-scraper
You can also use just the numeric actorId returned by the API.
Run Status Reference
| Status | Description |
|---|---|
| READY | Waiting to be assigned to a worker |
| RUNNING | Currently executing |
| SUCCEEDED | Completed successfully |
| FAILED | Completed with an error |
| TIMED-OUT | Exceeded the timeout |
| ABORTED | Manually stopped |
Terminal states (run will not change): SUCCEEDED, FAILED, TIMED-OUT, ABORTED.
Best Practices
-
Prefer async + waitForFinish over call: Use
startwith a shortwaitForFinishfor most workflows. Reservecallonly for Actors that reliably finish in under 2 minutes. -
Poll with waitForFinish: For longer Actors, use
startto get arunId, then userun.waitForFinishin a subsequent step after a delay. -
Read output from Datasets: Actors typically write output to a default Dataset. After a run, use
dataset.listItemswith the default dataset ID from the run response. -
Use Tasks for reusable configurations: Save common Actor inputs as Tasks to avoid repeating large
inputblobs in your workflows. -
Filter Dataset items with
clean: Setclean: trueto skip empty items and hidden metadata fields when listing dataset items. -
Specify the
buildtag: Pin runs to a specific build tag (e.g.latestor a version number) to avoid unexpected behaviour from automatic Actor updates. -
Handle pagination: Default limit is 100. Use
offsetto page through large result sets. -
Respect Apify rate limits: Free accounts have API rate limits. Add delays between rapid consecutive calls in high-volume workflows.
Troubleshooting
| Issue | Resolution |
|---|---|
| 401 Unauthorized | Verify your API token is correct; regenerate it in Apify Console → Settings → Integrations |
| Actor run FAILED | Check the Actor's log in Apify Console for the error cause; verify input matches the Actor's schema |
| Actor run TIMED-OUT | Increase timeout, reduce scope of work (fewer pages, smaller input), or use a more powerful memory tier |
| Dataset items empty | The Actor may have written to a non-default dataset; check the run's defaultDatasetId field |
actorId invalid characters | Use only alphanumeric, hyphens, underscores, and dots; separate username and actor name with ~ |
| Key-Value Store record not found | Confirm storeId and key are correct; keys are case-sensitive |
| Schedule not triggering | Verify isEnabled: true and that the cronExpression is valid; check timezone settings in Apify Console |
| Blocking call times out in workflow | The Actor took longer than timeout; switch to async start + run.waitForFinish pattern |
Workflow Context
This integration is particularly useful for:
- Automated Web Scraping: Trigger scraping Actors from NINA workflows to collect data from websites on demand
- Data Pipeline Orchestration: Chain Actor runs and pass output datasets downstream to other workflow nodes
- Scheduled Data Collection: Create and manage Schedules to automatically run data collection on a recurring basis
- ETL Workflows: Extract data via Actors, transform it in other nodes, and push results back to Datasets
- Monitoring and Alerting: Periodically run Actors to check website changes and trigger alerts based on output
- Competitive Intelligence: Automate market data collection and feed results into analytical workflows
Security Considerations
- Protect your API token: Store it only in NINA Credentials — never hardcode it in workflow parameters or logs.
- Scope token permissions: Apify personal tokens grant full account access. Use organisation-level tokens if available to limit blast radius.
- Review Actor input: Actors receive your
inputpayload in plaintext. Avoid passing sensitive data (passwords, PII) unless the Actor is owned and audited by your team. - Audit third-party Actors: Running a public Actor from the Apify Store gives it access to your Apify account context. Only use Actors from trusted publishers.
- Data residency: Datasets and Key-Value Stores are hosted on Apify's infrastructure. Avoid storing confidential data unless you are on a plan with data residency guarantees.
Additional Resources
- Apify API Reference
- Apify Console
- Apify Actor Store
- Apify SDK Documentation
- Cron Expression Reference
Updated: 2026-04-07