Reconnaissance Fundamentals
Mapping the attack surface to find what others miss
What You'll Discover
🎯 Why This Matters
The best bugs hide on assets others don't know exist. A forgotten subdomain, an old API version, a staging server - these are where you find critical vulnerabilities. While hundreds of hunters compete on the main website, thorough reconnaissance lets you find targets all to yourself. Good recon is often the difference between constant duplicates and consistent bounties.
🔍 What You'll Learn
- Subdomain enumeration (passive and active) and why it works
- Content and endpoint discovery techniques
- Technology fingerprinting to guide your testing
- Historical data analysis (Wayback Machine and other sources)
- Building a complete recon workflow with practical examples
🚀 Your First Win
By the end of this chapter, you'll have a complete reconnaissance methodology that uncovers hidden attack surface - and you'll understand WHY each technique works.
🔧 Try This Right Now
Run a foundational recon on any in-scope target (use example.com for practice):
# STEP 1: Find subdomains (passive - uses public data sources)
subfinder -d <target> -o subdomains.txt
# Expected: List of subdomains saved to subdomains.txt
# Example output: api.target.com, staging.target.com, dev.target.com
# STEP 2: Check which subdomains are alive (have web servers)
cat subdomains.txt | httpx -o live.txt
# Expected: Only responding hosts saved to live.txt
# These are your actual targets
# STEP 3: Find hidden directories and files
ffuf -u https://<target>/FUZZ -w /usr/share/seclists/Discovery/Web-Content/common.txt
# Expected: List of discovered paths like /admin, /api, /backup
# STEP 4: Check historical URLs (what existed in the past)
echo "<target>" | waybackurls | tee wayback.txt
# Expected: URLs archived by Wayback Machine over years
Success indicator: You'll see subdomains, live hosts, hidden paths, and historical URLs - assets you'd never find by clicking around the website.
Skills You'll Master
Subdomain Enumeration
Find all subdomains using passive and active techniques
Content Discovery
Uncover hidden directories, files, and API endpoints
Historical Analysis
Find forgotten endpoints using archived data
Workflow Building
Combine tools into an efficient, repeatable process
Understanding Reconnaissance
"You can't hack what you don't know exists. Recon is where bug bounty is won or lost."
Passive vs Active Reconnaissance
There are two fundamentally different approaches to reconnaissance:
Passive Recon — Gathering information without directly touching the target. You query third-party databases, search engines, and public records. The target has no idea you're researching them because you never send them any traffic.
Active Recon — Directly interacting with the target to discover information. This includes port scanning, directory brute-forcing, and subdomain brute-forcing. The target can see this traffic in their logs.
Why it matters: Passive recon is always safe and within scope. Active recon may be restricted by program rules (some prohibit automated scanning). Always check the program's policy, and when in doubt, start passive.
Why Subdomain Enumeration Works
You might wonder: how can tools find subdomains without asking the target? The internet leaves traces everywhere:
- Certificate Transparency Logs: When a company gets an SSL certificate for api.company.com, it gets logged publicly. Tools query these logs.
- DNS Records: Some DNS providers allow zone transfers or have queryable databases.
- Search Engines: Google indexes pages, and those pages often link to subdomains.
- Web Archives: The Wayback Machine crawls the internet and records URLs it finds, including subdomains.
- Security Research: Services like Shodan and Censys actively scan the internet and record what they find.
Tools like subfinder query all these sources simultaneously, giving you subdomains that would take hours to find manually - in seconds.
Why Historical URLs Matter
The Wayback Machine (archive.org) has been crawling the internet since 1996, saving snapshots of websites. When developers remove a feature from a website, they often:
- Remove the link from the frontend
- Forget to remove the backend code
The endpoint still exists and still works - they assume it's safe because no one can find it. But the Wayback Machine remembers. Tools like waybackurls let you see every URL that ever existed on a domain, revealing "hidden" endpoints that may still be vulnerable.
Reconnaissance Workflow
Step 1: Subdomain Enumeration
Find all subdomains belonging to your target. Start passive, then go active if scope allows.
# PASSIVE ENUMERATION (safe, no direct contact with target)
# These tools query public databases, not the target itself
# Subfinder - fast, queries many sources
subfinder -d <target> -o subs_subfinder.txt
# What it does: Queries certificate logs, DNS databases, search engines
# Expected output: 50-500+ subdomains depending on target size
# Amass - thorough, takes longer but finds more
amass enum -passive -d <target> -o subs_amass.txt
# What it does: Queries even more sources than subfinder
# Expected output: Often finds additional subdomains subfinder misses
# Note: Can take 5-15 minutes for large targets
# Assetfinder - quick and simple
assetfinder <target> >> subs_assetfinder.txt
# What it does: Quick lookup from common sources
# Expected output: Overlaps with above but occasionally finds unique ones
# Combine all results and remove duplicates
cat subs_*.txt | sort -u > all_subdomains.txt
# What this does: Merges all files, sorts alphabetically, removes duplicates
# Expected output: A clean list of unique subdomains
# ─────────────────────────────────────────────────────────
# ACTIVE ENUMERATION (check scope first!)
# This sends traffic to the target - only do if program allows
# Brute-force subdomain guessing
ffuf -u https://FUZZ.<target> \
-w /usr/share/seclists/Discovery/DNS/subdomains-top1million-5000.txt \
-mc 200,301,302,403
# What it does: Tries common subdomain names (admin, api, dev, staging...)
# -mc: Only show these status codes (filter out "not found" responses)
# Expected output: Subdomains that exist but weren't in public databases
# IMPORTANT: Active brute-forcing can trigger rate limiting or alerts
# Be respectful, use reasonable thread counts, check program rules
Step 2: Live Host Discovery
Not all subdomains have active web servers. Filter down to hosts that actually respond.
# Probe for live web servers
cat all_subdomains.txt | httpx -o live_hosts.txt
# What it does: Sends HTTP requests to each subdomain, keeps only those that respond
# Expected output: Subset of subdomains with active web servers
# This is your actual target list
# Get more details about each live host
cat all_subdomains.txt | httpx -status-code -title -tech-detect -o detailed_hosts.txt
# Additional info:
# -status-code: Shows HTTP status (200, 301, 403, etc.)
# -title: Shows page title (reveals what the app is)
# -tech-detect: Identifies technologies (WordPress, React, nginx, etc.)
# This helps you prioritize - interesting titles like "Admin Panel" or "API Docs" stand out
# Check non-standard ports (many apps don't run on 80/443)
cat all_subdomains.txt | httpx -ports 80,443,8080,8443,8000,3000,9000 -o all_ports.txt
# What it does: Checks common web ports beyond the default
# Why: Development servers often run on port 3000, 8080, etc.
# Finding a dev server on port 8080 is often more valuable than the main site
Step 3: Content Discovery
Find hidden directories, files, and endpoints that aren't linked from the main pages.
# DIRECTORY FUZZING
# "Fuzzing" means trying many inputs to see what exists
# Find hidden directories
ffuf -u https://<target>/FUZZ \
-w /usr/share/seclists/Discovery/Web-Content/raft-medium-directories.txt \
-fc 404
# What it does: Replaces FUZZ with each word in the wordlist
# Example: tries /admin, /backup, /config, /api, etc.
# -fc 404: Filter out (hide) 404 "Not Found" responses
# Expected output: Directories that exist but aren't linked anywhere
# Find hidden files
ffuf -u https://<target>/FUZZ \
-w /usr/share/seclists/Discovery/Web-Content/raft-medium-files.txt \
-fc 404
# Looks for files like: config.php, .htaccess, backup.sql, .git/config
# These can contain sensitive information
# API ENDPOINT DISCOVERY
ffuf -u https://<target>/api/FUZZ \
-w /usr/share/seclists/Discovery/Web-Content/api/api-endpoints.txt
# Common API patterns: /api/users, /api/v1/accounts, /api/admin
# APIs are often less protected than web interfaces
# PARAMETER DISCOVERY
# Once you find an endpoint, discover what parameters it accepts
arjun -u https://<target>/endpoint
# What it does: Tries common parameter names to see which ones the endpoint accepts
# Example: Discovers that /search accepts ?q=, ?page=, ?debug=
# Finding hidden parameters like ?debug=true can reveal vulnerabilities
# WORDLIST NOTE:
# SecLists is the industry standard wordlist collection
# Install: git clone https://github.com/danielmiessler/SecLists.git /usr/share/seclists
# Or: apt install seclists (on Kali/Debian)
Step 4: Historical Analysis
Find URLs that existed in the past - forgotten endpoints that may still work.
# WAYBACK MACHINE URLs
echo "<target>" | waybackurls | tee wayback_urls.txt
# What it does: Queries Wayback Machine for all URLs ever archived for this domain
# Expected output: Hundreds to thousands of historical URLs
# These are URLs that existed at some point - many may still work
# GAU - Get All URLs (queries multiple sources)
gau <target> | tee gau_urls.txt
# What it does: Queries Wayback Machine + Common Crawl + other archives
# Often finds more URLs than waybackurls alone
# FILTER FOR INTERESTING ENDPOINTS
# Find potentially vulnerable file types
cat wayback_urls.txt | grep -E '\.(php|asp|aspx|jsp|json|xml|txt|log|bak|old)' | sort -u
# Why: These file types often process input (vuln to injection) or contain data
# Find URLs with parameters (potential injection points)
cat wayback_urls.txt | grep '=' | sort -u
# Why: Parameters are where user input goes - prime targets for XSS, SQLi, IDOR
# Example finds: /search?q=, /user?id=, /download?file=
# Find API endpoints
cat wayback_urls.txt | grep -E '/api/|/v1/|/v2/|/graphql' | sort -u
# Why: APIs often have authorization issues, especially old versions
# Find admin/sensitive paths
cat wayback_urls.txt | grep -iE 'admin|backup|config|debug|test|staging' | sort -u
# Why: These paths often have weaker security or expose sensitive info
# IMPORTANT: A URL in the archive doesn't guarantee it still works
# Test each interesting URL to see if it still responds
Real Recon Wins
🏆 The Forgotten Staging Server
A hunter ran subfinder and found staging.company.com in the results. The main site was hardened, but the staging server had debug mode enabled. Error messages exposed database credentials, leading to a $5,000 bounty for information disclosure.
Lesson: Non-production environments (staging, dev, test, uat) often have weaker security because developers assume they're hidden. They're not. Always enumerate subdomains.
🏆 The Wayback Machine Goldmine
Historical URLs revealed an old API endpoint /api/v1/users/export that was removed from the frontend two years ago. The hunter tried it - the endpoint still worked and had no authentication. It dumped the entire user database. $15,000 bounty.
Lesson: Developers remove links from the UI but forget to remove backend code. The Wayback Machine remembers what they forgot. Always check historical URLs.
🏆 The Non-Standard Port Discovery
After finding nothing on the main site, a hunter ran httpx with port scanning. Port 8443 had an admin panel for the application's internal tooling - with default credentials. Complete account takeover capability, rated Critical.
Lesson: Don't assume all web apps run on ports 80/443. Development tools, admin panels, and internal apps often run on non-standard ports. Check 8080, 8443, 3000, and others.
Frequently Asked Questions
How long should recon take?
Initial recon: 1-2 hours for a new target is reasonable. Run your tools, review results, identify interesting assets.
Continuous recon: Set up automated monitoring to catch new subdomains and changes (covered in the Automation chapter).
Balance: Don't spend weeks on recon before testing. The goal is finding vulnerabilities, not building a perfect asset inventory. Iterate between recon and testing.
Is subdomain brute forcing allowed?
Passive enumeration (subfinder, amass passive mode): Always safe. You're querying third-party databases, not the target.
Active brute forcing: Check the program's rules. Most wildcard scope programs (*.target.com) allow it, but some explicitly prohibit automated scanning. When in doubt, ask or stick to passive methods.
Rate limiting: Even when allowed, be respectful. Use reasonable thread counts, add delays if needed. Getting yourself blocked doesn't help anyone.
What if I can't install these tools?
Online alternatives exist:
- Subdomains: crt.sh (certificate transparency search), dnsdumpster.com
- Historical URLs: web.archive.org (Wayback Machine website)
- Technology detection: Wappalyzer browser extension
Tools are faster and more comprehensive, but you can do useful recon with nothing but a web browser if needed.
How do I know what's actually in scope?
Read the program policy carefully. Common scope patterns:
*.company.com - Any subdomain is in scope
www.company.com only - Only the main site, nothing else
Specific URLs listed - Only those exact targets
When a subdomain you find isn't explicitly mentioned, check if it matches the wildcard or ask the program. Testing out-of-scope assets removes your legal protection.
Where do I get wordlists?
SecLists is the gold standard - a curated collection of wordlists for all security testing needs:
git clone https://github.com/danielmiessler/SecLists.git
It includes wordlists for subdomain enumeration, directory fuzzing, parameter names, common passwords, and much more. Most examples in this course assume you have SecLists installed.
🎯 You've Got Recon Down!
You now have a complete reconnaissance workflow and understand why each technique works. Subdomain enumeration, live host discovery, content fuzzing, historical analysis - you'll find attack surface that other hunters miss.
Ready to build your testing methodology →