gau
The gau tool integrates seamlessly into Canva's automated security workflows, facilitating the discovery of known URLs from multiple sources, including AlienVault's OTX, the Wayback Machine, Common Crawl, and URLScan. Its primary purpose is to streamline the process of gathering URLs for subsequent security assessments, enhancing the efficiency of reconnaissance efforts.
Ideal Use Cases & Fit
gau excels in early-stage reconnaissance scenarios where security teams need to enumerate potential attack surfaces. The tool is particularly effective when fed a list of target domains in a newline-separated format and can significantly expedite the identification of previously indexed URLs related to those domains. Its utility diminishes in cases where real-time data fetching or live site interactions are required, as it primarily focuses on archived and known URLs.
Value in Workflows
In security workflows, gau adds substantial value by automating the discovery phase, allowing teams to gather critical URL data quickly. This tool is ideally positioned at the beginning of workflows focused on vulnerability assessments and penetration testing, providing foundational data that can inform subsequent scanning and analysis processes. Its integration ensures that security professionals can act upon comprehensive visibility, enhancing overall threat mitigation strategies.
Input Data
The gau tool expects input data in the form of newline-separated domains. The function of the input requires the following:
- Format: Newline-separated domains
- Function: target
- Required: Yes
- Example:
example.com
vulnweb.com
Configuration
- blacklist: Specifies a list of extensions to skip during URL fetching.
- filter-code: Allows filtering of results based on specific HTTP status codes.
- from: Fetches URLs starting from a specified date (format: YYYYMM).
- filter-type: Filters results based on specified mime-types.
- filter-params: Removes differing parameters of the same endpoint.
- json: Determines the output format, defaulting to JSON.
- match-code: Filters results to include only specific HTTP status codes.
- match-type: Filters results based on specified mime-types to match.
- providers: Designates which data sources to utilize (wayback, commoncrawl, otx, urlscan).
- proxy: Requires an HTTP proxy for requests, defaulting to PROXY_FULL.
- retries: Sets the number of retries for HTTP requests, with a default of 10.
- timeout: Configures the maximum timeout duration for HTTP client requests, defaulting to 60 seconds.
- subdomains: Optionally includes subdomains of the target domain in the search.
- to: Fetches URLs up to a specified date (format: YYYYMM).