Writing

I audited my own OSINT tool and found a query injection

Giuseppe Giona·30 March 2026

What the audit found

• Cypher injection via interpolated Neo4j labels — any plugin result could execute arbitrary queries.
• Uncapped graph expansion — seeding a high-connectivity domain could produce 20,000+ nodes.
• API keys stored as plaintext in SQLite. Proxy routing used a predictable hash.
• Six commits to close everything. 331 tests now pass.

Why I audited it

threadr is a reconnaissance tool I built for security assessments. 17 plugins query public APIs and DNS records, build a graph of relationships between entities (emails, domains, IPs, usernames), and store everything in Neo4j. There's a Tor proxy layer, Lévy-distributed timing, k-anonymity decoys — all the anonymity engineering I wrote about previously.

I'd been adding features and tests for months but hadn't done a proper security review of the tool itself. The maths was well-tested. The plumbing wasn't. So I sat down and read every file looking for the things I'd flag in someone else's code.

The Cypher injection

This was the worst one. Neo4j's query language, Cypher, doesn't support parameterised labels or relationship types. You can parameterise values ($val) but not the label in MERGE (n:Label). So I'd been interpolating them:

// the vulnerable version
await session.run(
  `MERGE (n:${label} {${key}: $val}) SET n += $props`,
  { val: props[key], props }
)

The label and key variables come from plugin results. Every plugin returns a PluginResult with typed nodes and edges. In normal operation, labels are things like Email, Domain, IP.

But nothing enforced that at runtime. A malicious or buggy plugin could return a label containing Cypher syntax, and the query would execute it. That's a textbook injection.

The fix

The type system already defines the valid labels and relationship types as string literal unions. I turned those into runtime sets and validate before any query runs:

const VALID_LABELS: ReadonlySet<string> = new Set<NodeType>([
  'Email', 'Username', 'Person', 'Domain', 'IP',
  'Certificate', 'Breach', 'Phone', 'Organization',
  'Port', 'Repository',
])

function assertLabel(label: string): void {
  if (!VALID_LABELS.has(label))
    throw new Error(`invalid label: ${label}`)
}

// now: validate then backtick-escape
assertLabel(label)
await session.run(
  `MERGE (n:\`${label}\` {\`${key}\`: $val}) SET n += $props`,
  { val: props[key], props }
)

Two layers: the whitelist rejects anything unexpected, and the backtick escaping handles the Cypher syntax safely even if the whitelist were somehow bypassed. Belt and braces.

Why it happened

Honestly, I knew Cypher didn't support parameterised labels when I wrote the code. I thought "the labels come from my typed plugin interface, so they're always valid." Which is true in the happy path. The problem is that TypeScript types vanish at runtime. A plugin returning { label: "Email\` RETURN 1; //" } passes type-checking because string is assignable to NodeType at the boundary where JSON is parsed.

TypeScript catches mistakes during development. It does nothing against malformed data at runtime. If a value flows into a query, it needs runtime validation regardless of what the type system says.

Everything else the audit found

Once I started looking, I found more:

Issue	Severity	Fix
Cypher injection via interpolated labels	critical	Whitelist + backtick escaping
No limit on graph expansion depth or batch size	high	MAX_DEPTH=2, MAX_BATCH_SIZE=200, MAX_TOTAL_NODES=2000
Default credentials in docker-compose	high	Required env vars, no defaults
O(n²) entity resolution	high	Blocking index with inverted token lookup
API keys stored as plaintext in SQLite	medium	AES-256-GCM encryption, HKDF-derived key
Proxy routing used Java's String.hashCode	medium	HMAC-SHA256 keyed by per-session nonce
No plugin execution timeout	medium	30s Promise.race wrapper on every plugin.run()
Neo4j failure permanently disabled writes	low	5-failure threshold with per-scan health reset
API key characters logged on burn	low	Removed key material from log output

Figure 1 — Self-audit findings by severity (N = 10)all closed in 6 commits

Bars are raw finding counts; the right-hand panel weighs each by a CVSS-style severity coefficient (critical 9.0, high 7.0, medium 4.5, low 1.5) to produce a single risk total. Even with most findings sitting at medium and low, the single critical and three highs dominate the weighted score — which is the right behaviour for a triage model. All nine closed across six commits before the post was published; current test count: 331 across 31 files.

The O(n²) entity resolution

This one was interesting to fix. The resolver compares Person nodes to find duplicates — the same real person discovered through different plugins. It was doing all-pairs comparison: every person against every other. 1,000 persons = 500,000 Jaro-Winkler string comparisons.

The fix uses a blocking index. Before comparing any pair, I extract tokens from each entity: exact emails, exact usernames, avatar hashes, and character bigrams from names. An inverted index maps each token to the entities that contain it. Only pairs sharing at least one token become candidates for full comparison.

// "John Doe" → bigrams: ["jo","oh","hn","n "," d","do","oe"]
// Two entities sharing "e:[email protected]" → candidate pair
// Bucket with >50 entities (e.g., bigram "th") → dropped as noise

const pairs = candidatePairs(indexed)
// 1000 persons: brute force = 499,500 pairs
//               blocked     = typically < 5,000

One property I test explicitly: if two entities share an exact email address, they must appear as a candidate pair. The blocking can miss fuzzy-only matches (two people named "Jon" and "John" with no other shared data), but that's an acceptable tradeoff — a name-only match would be capped at 0.6 confidence anyway.

The proxy routing problem

threadr routes each plugin's requests through a different Tor exit node so no single exit sees the full query pattern. The mapping was done with Java's String.hashCode()— deterministic, fast, and completely predictable. Anyone who reads the source code knows which plugin uses which exit. That defeats the purpose.

The fix: HMAC-SHA256 keyed by a random nonce generated at the start of each scan session. Same plugin, different scan, different exit node. The nonce never leaves memory.

What I took away from this

The maths was fine. The Lévy timing, the Dempster-Shafer fusion, the spectral clustering — all well-tested, all correct. The vulnerabilities were in the parts I thought were too simple to get wrong. String interpolation into a query. A missing cap on a loop. A hash function I picked for convenience.

I think that's common. The interesting code gets the attention. The glue code gets written once and forgotten. But the glue is where the attack surface lives, because that's where external data meets internal assumptions.

The other thing: TypeScript creates a false sense of security if you're not careful. The type says NodeType, which is a union of string literals. At compile time, that's a constraint. At runtime, after JSON.parse, it's just a string. Every trust boundary needs runtime validation, not just type annotations.

All fixes are in the threadr repo. 331 tests across 31 files. The blocking index, label whitelist, and HMAC routing are in packages/shared/src/scoring.ts, apps/worker/src/graph.ts, and apps/worker/src/proxy.ts.