cewl

Cewl is a content discovery tool that systematically spiders websites to generate word lists for password cracking and identify valuable metadata, including email addresses. In automated security workflows, it acts as a pivotal element in gathering information that can be used for vulnerability assessment and penetration testing.

Ideal Use Cases & Fit

Cewl excels in scenarios where a detailed enumeration of a target’s content is crucial, such as during the early stages of reconnaissance in penetration testing. It is particularly useful when collecting word lists for brute-force attacks and identifying email addresses or metadata, which can expose potential vulnerabilities. However, it may not be suitable for environments where high-volume traffic is restricted or where offsite content aggregation is undesirable.

Value in Workflows

Integrating Cewl into security workflows enhances the reconnaissance phase by automating the collection of useful information about target applications. It can be positioned early in the workflow to identify entry points through password cracking, enhancing the efficacy of vulnerability assessments. Additionally, its ability to obtain metadata enriches subsequent analysis, providing a broader context for security teams.

Input Data

Cewl expects input data in the format of newline-separated URLs, which serve as targets for crawling. The function of this input is to define where the tool should begin its spidering process. An example input would be:

https://example.com
https://test.com

Configuration

depth: Controls how deep the spider will crawl into the website structure, with a default value of 2.
min-word-length: Sets the minimum length for words to be included in the word list, defaulting to 3.
max-word-length: Determines the maximum allowed length for words.
offsite: Enables the spider to visit external sites; defaults to false.
with-numbers: Allows words containing numbers to be included as well; default is false.
lowercase: Converts all parsed words to lowercase, with the default setting being off.
meta: Determines whether metadata is included in the output; defaults to false.
email: Specifies if email addresses should be included, defaulting to false.
count: Shows the count of each word found; default is false.
proxy: Defines the proxy host to use for requests.
proxy-port: Specifies the port for the proxy.
user-agent: Sets the user agent to be sent during requests, defaulting to Mozilla/5.0.

Ideal Use Cases & Fit​

Value in Workflows​

Input Data​

Configuration​

Ideal Use Cases & Fit

Value in Workflows

Input Data

Configuration