2026 AI Security Research

VenomX Agentic Pentesting

0
Specialists
Autonomous agents per attack phase
0
Security Tools
Nmap, SQLmap, Metasploit, Hydra & more
32K
Token Context
Nemotron 30B via vLLM
CCSC
Published
CCSC Central Plains 2026 conference
01

The Agent Loop

VenomX doesn't just answer questions — it acts. A master-specialist multi-agent system orchestrates specialized agents, executes security tools, and reasons across a shared state graph toward a result, autonomously. Give it a target. Walk away.

The reasoning core is NVIDIA-Nemotron-3-Nano-30B, served through vLLM with a 32K token context window. BAAI/bge-m3 embeds a RAG knowledge base of CVE records from NVD, Exploit-DB, MITRE ATT&CK, HackTricks, GTFOBins, and more into pgvector so agents pull context-relevant threat intel at query time. Presented at CCSC Central Plains 2026.

Reliability

WAL-Backed State Crash-safe engagement persistence
Canary Detection Prompt hijack detection

Intelligence

RAG Dispatch CVE-augmented tool reasoning
3-Path Classification Complete, partial & emerging attack paths

Core Stack

Nemotron 30B + vLLM Inference with abliteration
pgvector + BAAI/bge-m3 Vector knowledge base
02

Demo

03

How the Agent Works

Master → Dispatch → Specialist → Graph. VenomX uses a master-specialist multi-agent architecture. A MasterAgent orchestrates eleven specialists, each confined to a distinct attack phase. Rather than talking to each other, they share state through a central FindingGraph, so every agent begins its task with full knowledge of what came before it.

1. Assess Graph State

Each dispatch cycle begins with MasterAgent feeding the LLM a current snapshot of the FindingGraph. The LLM reads the known hosts, open ports, services, credentials, and attack paths, then decides what gaps are worth filling next.

2. Dispatch a TaskSpec

The LLM returns a TaskSpec — a structured JSON object that names the target specialist, the objective, and the context pulled from the graph. The master routes it to whichever of the eleven specialists fits the current state. Not every specialist runs on every engagement. The SMB specialist only activates when ports 139 or 445 appear in the graph. SQL only runs when injectable web endpoints have been confirmed.

3. Specialist Mini-Loop

Each specialist runs its own internal Plan → Tool → Observe → Reason → Act loop against the assigned task. It calls its tools, parses output into structured objects, and reasons about what the results mean. Every successful tool call writes its findings into the shared FindingGraph on the spot, before the loop continues.

4. Graph Update & Summary

When the specialist finishes, it returns a plain-text summary to the master. The graph has already been updated throughout the run, so all new hosts, ports, services, credentials, and vulnerabilities are in place by the time the master reads back. The master appends the summary to its dispatch log and returns to step 1.

5. Report Generation

When the master determines the engagement is done — or the iteration limit is reached — it dispatches the report specialist. The report agent builds its sections directly from live graph data, covering discovered services, scored vulnerabilities, captured credentials, and attack paths. It makes a single LLM call for the executive summary and recommendations, then delivers the finished report as a .docx.

04

Architecture

MasterAgent & 11 Specialists

MasterAgent owns the shared state and drives the engagement. At startup it instantiates all eleven specialists (osint, recon, web, auth, vuln, sql, smb, ad, exploit, post, report) and passes each of them shared references to the graph and credential store. Each dispatch cycle, the master asks the LLM what to do next given the current graph state, then routes a TaskSpec to the right specialist.

FindingGraph — the State Bus

Specialists don't talk to each other directly. They all read from and write to a shared FindingGraph, a WAL-backed graph that persists each discovery the moment it happens. When the master dispatches a new specialist, it passes graph.summary_for_llm() as context, so each specialist arrives knowing every host, port, service, credential, and attack path that prior agents found.

Conditional Dispatch

Idle specialists stay out of the loop. SMB activates on ports 139 or 445. SQL runs only on confirmed injectable endpoints. AD activates when ports 88 and 389 are both present (DC signature). Post-exploitation fires only when a shell is gained or credentials are confirmed. WAL-backed persistence means findings survive a crash mid-engagement.

AttackPathFinder

Classifies complete and partial attack paths across the graph after each dispatch cycle. Surfaces the highest-impact chains automatically using CVSS scores as edge weights, without requiring manual correlation across agent outputs.

Python · master_agent.py
# Shared state — owned by master, passed by reference to all 11 specialists
self.graph = FindingGraph(session_id, wal_path, json_path)
self.credential_store = CredentialStore(session_id, persist_path)

self._specialists = {
    "osint":   OsintSpecialist(**shared),    # subfinder
    "recon":   ReconSpecialist(**shared),    # masscan + nmap + netcat
    "web":     WebSpecialist(**shared),      # httpx + nikto + gobuster + nuclei + wpscan
    "auth":    AuthSpecialist(**shared),     # kerbrute + hydra
    "vuln":    VulnSpecialist(**shared),     # searchsploit + metasploit
    "sql":     SqlSpecialist(**shared),      # sqlmap
    "smb":     SmbSpecialist(**shared),      # enum4linux + netexec
    "ad":      ADSpecialist(**shared),       # getuserspns + getnpusers
    "exploit": ExploitSpecialist(**shared),  # metasploit (Phase 2 only)
    "post":    PostSpecialist(**shared),     # netexec (post-exploitation)
    "report":  ReportSpecialist(**shared),   # reads graph, writes report
}

for _ in range(self.MAX_DISPATCHES):
    task = self._decide_next_task(user_input)  # LLM reads graph, returns TaskSpec
    if task is DONE:
        break
    result = self._specialists[task.specialist].run(task)
    self._dispatch_log.append(result.summary)  # graph already updated by specialist
05

Tool Inventory

Sixteen security tools, each wrapped in a Python interface that sanitizes input, manages execution, and structures output for the agent's observe step.

Reconnaissance

nmap Host discovery & port scanning
masscan High-speed port scanning
subfinder Subdomain enumeration (OSINT)
httpx Web fingerprinting

Web Testing

gobuster Directory & endpoint discovery
nikto Web server misconfiguration scan
sqlmap SQL injection & data extraction
nuclei Template-based vuln scanning
wpscan WordPress enumeration

Credentials & Exploitation

hydra Brute force & credential stuffing
kerbrute Kerberos user enumeration
searchsploit Exploit-DB offline search
metasploit Exploitation framework

Windows / AD

enum4linux SMB & Samba enumeration
netexec SMB / Windows post-exploitation
GetUserSPNs Kerberoasting
GetNPUsers AS-REP roasting
06

Output & Safety

Structured Output Parsing

Tool outputs are parsed into typed objects before touching the context window. Nmap XML becomes host and port dicts. SQLMap output becomes injection and database dicts. The LLM sees clean, structured data rather than hundreds of lines it has to interpret on its own.

Context Window Budget

Nemotron 30B runs a 32K token context window — eight times the budget of the 9B it replaced. Per-specialist caps hold tool output to 6,000 chars and RAG threat intel to 3,000 per turn. Each specialist starts fresh, so findings from one phase never pollute the reasoning of the next.

Structural Guardrails

Safety controls are wired into the code, not whispered in a system prompt. Target scope is enforced at the tool wrapper level. Max iteration counts stop runaway loops. Subprocess timeouts kill stalled tools. The agent has no way to talk its way around any of this.

07

Key Learnings

Building a security tool forces you to reason from both sides of the wire at once. You design the attack surface and the safety gate in the same breath.

01

Multi-agent systems need explicit state contracts. Without a shared FindingGraph, eleven agents produce eleven disconnected reports, not one coherent attack picture.

02

Retrieval quality matters more than corpus size. A smaller, well-chunked security knowledge base outperforms a raw dump every time.

03

Model abliteration is a precise surgical operation, not a blanket jailbreak. Done wrong, it breaks the model's general reasoning alongside the safety filters.

04

CVSS scores are inputs, not outputs. The attack path that matters is the one connecting low-severity findings into a critical chain — and that requires graph traversal, not a sorted list.

05

Guardrails at the prompt level are worthless. Scope enforcement, iteration caps, and subprocess timeouts wired into the tool layer are the only controls that actually hold under a determined loop.