Subtrace v2 | Attack Surface Mapper

Subtrace v2 — Attack Surface Mapper

Subtrace v2 is a reconnaissance and attack surface mapping tool developed in Python. The main goal is to transform raw reconnaissance data into a structured view of a target’s exposure: subdomains, endpoints, historical URLs, client-side routes extracted from JavaScript, and enriched metadata such as reachability, technologies, and risk heuristics.

Disclaimer: This tool is intended for authorized security testing and controlled environments only. Passive data sources and generated queries must be used responsibly and according to their terms of service.

🎯 What Does Subtrace Do?

Subtrace runs a modular pipeline that combines passive discovery, live crawling, JavaScript intelligence, technology fingerprinting, and risk scoring. Results are stored in a graph model to enable reporting and exporting into formats useful for security analysis.

🧠 Use Cases

Bug bounty / asset mapping: Identify subdomains, endpoints, and interesting surfaces before testing.
Red team recon: Build a structured map of reachable assets and high-signal endpoints (admin/upload/auth/debug).
Blue team visibility: Identify exposed panels, misconfigured endpoints, and sensitive file patterns.
Client-side recon: Analyze JavaScript to extract endpoints, routes, tokens, and suspicious patterns.
Reporting: Produce an HTML report with distributions (origins, risks, technologies, reachability) for stakeholders.
Graph analysis: Export to Neo4j (Cypher) for deeper investigation and relationship exploration.

🔐 Implemented Features

Passive discovery: crt.sh, AlienVault OTX, HackerTarget, and Wayback Machine.
Async DNS resolution: Resolve subdomains to IPs and CNAME records.
Subdomain reachability probe: HTTP/HTTPS probing with status code, final URL, server header.
Live crawler: URL canonicalization, per-host rate limiting, robots.txt handling, depth/page limits.
HTML parsing: Links, scripts, forms, and metadata extraction.
JavaScript intelligence: Regex & AST extraction of routes/endpoints, token & websocket heuristics.
Source map analysis: Extract routes from JavaScript sourcemaps (.map).
Technology fingerprinting: Header/HTML/script/URL signatures (CDNs, frameworks, cloud providers, etc.).
Risk scoring: Heuristic scoring of endpoints based on URL patterns and method usage.
JS vulnerability scanning: built-in heuristics + optional integration with external tools (Semgrep & TruffleHog).
Google dorks generator: Produces targeted search queries (no scraping).
Exports: HTML, Markdown, JSON, CSV, Neo4j Cypher.

🏗️ How It Was Created (Architecture)

The project was implemented as a set of independent modules connected by a CLI orchestrator. Each stage outputs structured data that is converted into nodes in an Attack Surface Graph.

Graph model: Nodes and edges are managed using a NetworkX MultiDiGraph abstraction.
Discovery modules: Each passive provider returns sets of subdomains and/or URLs.
Crawler engine: Uses async HTTP requests with canonicalization and extraction of assets.
Heuristic analyzers: Risk scoring and fingerprinting add “meaning” to raw endpoints.
Reporting layer: Exporters transform graph data into readable formats for security workflows.

📦 Installation

Clone GitHub Repository:

https://github.com/<your-username>/<your-subtrace-repo>

Create virtual environment and install requirements:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Install missing dependencies used by optional modules (recommended):
```
pip install aiosqlite esprima
```
(Optional) Install external scanners:
```
pip install semgrep trufflehog
```
(Optional) Install Playwright for SPA scanning:
```
pip install playwright
playwright install
```
Run the tool:
```
python3 main.py
```

🖥️ Main Menu

Subtrace provides an interactive CLI with target selection, scanning modules, and export controls. The crawler stage also displays a progress bar with visited/discovered counters.

Attack Surface Mapper

T  Set Target
1  Passive Discovery (+DNS + probe)
2  Live Crawling
3  Playwright SPA Scan
4  Historical URL Analysis
5  Technology Fingerprinting
6  Risk Analysis
7  Run Full Pipeline
8  Export Reports (HTML/MD/JSON/CSV/Neo4j)
D  Google Dorks (generate links)
S  Settings (Semgrep/TruffleHog)
0  Exit

🚀 How to Use

A typical workflow is:

Set target (domain or URL)
Run full pipeline to gather and enrich findings
Export reports (HTML recommended)
Optional: enable Semgrep/TruffleHog in settings for external JS scanning

Select option: T
Target: example.com

Select option: 7
Running passive discovery...
Starting crawler...
Crawling complete ... visited=300 discovered=...
Detected technologies: ...
Risk scoring applied to endpoints.

🧾 Reporting & Findings Distribution

Subtrace generates an HTML report designed to provide both a quick overview and actionable details. The report includes:

KPIs: endpoints, subdomains, reachable assets, technologies, JS findings.
Risk distribution: low/medium/high/critical breakdown.
Provenance distribution: shows which findings originated from crawl vs discovered JS routes vs Wayback URLs.
Subdomain reachability: DNS resolution + HTTP/HTTPS probe status in a table.
JavaScript findings: heuristics + external scanner alerts with evidence.
Google dorks: generated query links for manual investigation (no scraping).

🧪 JavaScript Vulnerability Scanning (Heuristics + External Tools)

During crawling, Subtrace collects JavaScript files and performs:

Built-in heuristics: hardcoded secrets, embedded JWTs, internal references, dangerous sinks (eval, innerHTML, etc.).
Semgrep (optional): rule-based static analysis for JavaScript.
TruffleHog (optional): secret detection on downloaded JS content.

Note: These findings are indicators and require manual verification. The goal is to surface high-signal leads quickly during reconnaissance.

📤 Export Formats

HTML: best for stakeholders and distribution overview
JSON: structured output for tooling / automation
Markdown: simple report output for notes
CSV: endpoints table (risk/score/origin)
Neo4j Cypher: import into Neo4j for graph exploration

🧩 Neo4j Integration (How to Read the Graph)

Subtrace exports a Cypher script (subtrace_neo4j.cypher) that can be imported into Neo4j.

cypher-shell -a bolt://localhost:7687 -u neo4j -p yourpassword < subtrace_neo4j.cypher

Example queries:

MATCH (e:ENDPOINT)
RETURN e.value, e.risk, e.score
ORDER BY e.score DESC
LIMIT 25;

✅ Summary

Subtrace v2 demonstrates end-to-end security tooling development: async networking, parsing, graph modeling, heuristic analysis, and reporting. The project was designed to be modular and extendable, allowing new data sources, new extraction rules, and new scanners to be added with minimal disruption.

📁 https://github.com/MarcoAbreu2002/Subtrace