VenomX — AI Penetration Testing

01

The Agent Loop

VenomX doesn't just answer questions — it acts. A master-specialist multi-agent system orchestrates specialized agents, executes security tools, and reasons across a shared state graph toward a result, autonomously. Give it a target. Walk away.

The reasoning core is NVIDIA-Nemotron-3-Nano-30B, served through vLLM with a 32K token context window. BAAI/bge-m3 embeds a RAG knowledge base of CVE records from NVD, Exploit-DB, MITRE ATT&CK, HackTricks, GTFOBins, and more into pgvector so agents pull context-relevant threat intel at query time. Presented at CCSC Central Plains 2026.

Reliability

WAL-Backed State Crash-safe engagement persistence

Canary Detection Prompt hijack detection

Intelligence

RAG Dispatch CVE-augmented tool reasoning

3-Path Classification Complete, partial & emerging attack paths

Core Stack

Nemotron 30B + vLLM Inference with abliteration

pgvector + BAAI/bge-m3 Vector knowledge base

02

Demo

03

How the Agent Works

Master → Dispatch → Specialist → Graph. VenomX uses a master-specialist multi-agent architecture. A MasterAgent orchestrates eleven specialists, each confined to a distinct attack phase. Rather than talking to each other, they share state through a central FindingGraph, so every agent begins its task with full knowledge of what came before it.

1. Assess Graph State

Each dispatch cycle begins with MasterAgent feeding the LLM a current snapshot of the FindingGraph. The LLM reads the known hosts, open ports, services, credentials, and attack paths, then decides what gaps are worth filling next.

2. Dispatch a TaskSpec

The LLM returns a TaskSpec — a structured JSON object that names the target specialist, the objective, and the context pulled from the graph. The master routes it to whichever of the eleven specialists fits the current state. Not every specialist runs on every engagement. The SMB specialist only activates when ports 139 or 445 appear in the graph. SQL only runs when injectable web endpoints have been confirmed.

3. Specialist Mini-Loop

Each specialist runs its own internal Plan → Tool → Observe → Reason → Act loop against the assigned task. It calls its tools, parses output into structured objects, and reasons about what the results mean. Every successful tool call writes its findings into the shared FindingGraph on the spot, before the loop continues.

4. Graph Update & Summary

When the specialist finishes, it returns a plain-text summary to the master. The graph has already been updated throughout the run, so all new hosts, ports, services, credentials, and vulnerabilities are in place by the time the master reads back. The master appends the summary to its dispatch log and returns to step 1.

5. Report Generation

When the master determines the engagement is done — or the iteration limit is reached — it dispatches the report specialist. The report agent builds its sections directly from live graph data, covering discovered services, scored vulnerabilities, captured credentials, and attack paths. It makes a single LLM call for the executive summary and recommendations, then delivers the finished report as a .docx.

04

Architecture

MasterAgent & 11 Specialists

MasterAgent owns the shared state and drives the engagement. At startup it instantiates all eleven specialists (osint, recon, web, auth, vuln, sql, smb, ad, exploit, post, report) and passes each of them shared references to the graph and credential store. Each dispatch cycle, the master asks the LLM what to do next given the current graph state, then routes a TaskSpec to the right specialist.

FindingGraph — the State Bus

Specialists don't talk to each other directly. They all read from and write to a shared FindingGraph, a WAL-backed graph that persists each discovery the moment it happens. When the master dispatches a new specialist, it passes graph.summary_for_llm() as context, so each specialist arrives knowing every host, port, service, credential, and attack path that prior agents found.

Conditional Dispatch

Idle specialists stay out of the loop. SMB activates on ports 139 or 445. SQL runs only on confirmed injectable endpoints. AD activates when ports 88 and 389 are both present (DC signature). Post-exploitation fires only when a shell is gained or credentials are confirmed. WAL-backed persistence means findings survive a crash mid-engagement.

AttackPathFinder

Classifies complete and partial attack paths across the graph after each dispatch cycle. Surfaces the highest-impact chains automatically using CVSS scores as edge weights, without requiring manual correlation across agent outputs.

Python · master_agent.py

# Shared state — owned by master, passed by reference to all 11 specialists
self.graph = FindingGraph(session_id, wal_path, json_path)
self.credential_store = CredentialStore(session_id, persist_path)

self._specialists = {
    "osint":   OsintSpecialist(**shared),    # subfinder
    "recon":   ReconSpecialist(**shared),    # masscan + nmap + netcat
    "web":     WebSpecialist(**shared),      # httpx + nikto + gobuster + nuclei + wpscan
    "auth":    AuthSpecialist(**shared),     # kerbrute + hydra
    "vuln":    VulnSpecialist(**shared),     # searchsploit + metasploit
    "sql":     SqlSpecialist(**shared),      # sqlmap
    "smb":     SmbSpecialist(**shared),      # enum4linux + netexec
    "ad":      ADSpecialist(**shared),       # getuserspns + getnpusers
    "exploit": ExploitSpecialist(**shared),  # metasploit (Phase 2 only)
    "post":    PostSpecialist(**shared),     # netexec (post-exploitation)
    "report":  ReportSpecialist(**shared),   # reads graph, writes report
}

for _ in range(self.MAX_DISPATCHES):
    task = self._decide_next_task(user_input)  # LLM reads graph, returns TaskSpec
    if task is DONE:
        break
    result = self._specialists[task.specialist].run(task)
    self._dispatch_log.append(result.summary)  # graph already updated by specialist

05

Tool Inventory

Sixteen security tools, each wrapped in a Python interface that sanitizes input, manages execution, and structures output for the agent's observe step.

Reconnaissance

nmap Host discovery & port scanning

masscan High-speed port scanning

subfinder Subdomain enumeration (OSINT)

httpx Web fingerprinting

Web Testing

gobuster Directory & endpoint discovery

nikto Web server misconfiguration scan

sqlmap SQL injection & data extraction

nuclei Template-based vuln scanning

wpscan WordPress enumeration

Credentials & Exploitation

hydra Brute force & credential stuffing

kerbrute Kerberos user enumeration

searchsploit Exploit-DB offline search

metasploit Exploitation framework

Windows / AD

enum4linux SMB & Samba enumeration

netexec SMB / Windows post-exploitation

GetUserSPNs Kerberoasting

GetNPUsers AS-REP roasting

06

Output & Safety

Structured Output Parsing

Tool outputs are parsed into typed objects before touching the context window. Nmap XML becomes host and port dicts. SQLMap output becomes injection and database dicts. The LLM sees clean, structured data rather than hundreds of lines it has to interpret on its own.

Context Window Budget

Nemotron 30B runs a 32K token context window — eight times the budget of the 9B it replaced. Per-specialist caps hold tool output to 6,000 chars and RAG threat intel to 3,000 per turn. Each specialist starts fresh, so findings from one phase never pollute the reasoning of the next.

Structural Guardrails

Safety controls are wired into the code, not whispered in a system prompt. Target scope is enforced at the tool wrapper level. Max iteration counts stop runaway loops. Subprocess timeouts kill stalled tools. The agent has no way to talk its way around any of this.

07

Key Learnings

Building a security tool forces you to reason from both sides of the wire at once. You design the attack surface and the safety gate in the same breath.

01

Multi-agent systems need explicit state contracts. Without a shared FindingGraph, eleven agents produce eleven disconnected reports, not one coherent attack picture.

02

Retrieval quality matters more than corpus size. A smaller, well-chunked security knowledge base outperforms a raw dump every time.

03

Model abliteration is a precise surgical operation, not a blanket jailbreak. Done wrong, it breaks the model's general reasoning alongside the safety filters.

04

CVSS scores are inputs, not outputs. The attack path that matters is the one connecting low-severity findings into a critical chain — and that requires graph traversal, not a sorted list.

05

Guardrails at the prompt level are worthless. Scope enforcement, iteration caps, and subprocess timeouts wired into the tool layer are the only controls that actually hold under a determined loop.

VenomX Agentic Pentesting