A reading list and a monthly digest.
Curated work shaping how I think about offensive security, ML supply chain, and the seams between them. The reading list is evergreen; the digest goes out monthly when there's something worth saying.
Reading list
Pushes provably-secure linguistic steganography toward higher embedding capacity by maintaining a list of candidate decodings rather than a single one. Directly relevant to the entropy-budget question in any LLM-mediated covert-channel design.
Style-level (not token- or syntax-level) backdoor triggers, generated by an LLM as a poisoned-sample synthesizer. Adds an auxiliary target loss to stabilize payload injection during fine-tuning. Evaluated against seven model families.
The canonical reference for byte-level steganography in float32 weight tensors. Explicitly defers the decoder to a separately-deployed loader, which is the substantive limitation when read against a channel/decoder/substrate framework.
Origin paper for trigger-based co-trained backdoors. The decoder and the channel are baked into the network's weights together, which is why detection has to be behavioral rather than static.
Replaces random combination of crowdsourced jailbreak ingredients with a contextual-bandit learner that scores combinations based on prior success. Roughly 2,200-parameter bandit on top of SBERT embeddings. Transfers across models without retraining.
Distributes adversarial intent across stateless turns, evading moderation that evaluates each turn independently. Notable for showing that the threat model "single-turn safety classifier" is incomplete against an attacker LLM operating across sessions.
Poisoning 0.1% of pre-training data is enough for three of four backdoor objectives (DoS, belief manipulation, jailbreaking) to survive post-training. DoS persists at 0.001%. The supply-chain layer the threat model has to start at.
Six-class pitfall taxonomy (P1–P6) split into statically-checkable (Tier-1) and trace/dataflow-dependent (Tier-2) classes. Three workflow challenges (email, document, crypto) with hardened-vs-baseline server pairs and three attack families: tool-metadata poisoning, puppet servers, image-to-tool chains.
First end-to-end empirical evaluation of attacks against MCP. Four attack categories: tool poisoning, puppet attacks, rug pull, and exploitation via malicious external resources. Useful as the lay-of-the-land paper before any MCP-specific work.
Concrete demonstration of cross-server data exfiltration in MCP. The barrier-to-entry argument matters: this is not a sophisticated attack class, which is the point.
ML-centered threat modeling applied to an agentic browser. Four prompt-injection techniques against the AI assistant, all chained to exfiltrate Gmail data. The methodology — TRAIL — is more transferable than any individual finding.
Two CVEs (CVE-2026-25905, CVE-2026-25904) in a popular MCP server template. The class of bug is a useful pattern: trusting that a Deno sandbox plus a containerized python runner will hold under MCP-style invocation.
Two remotely-exploitable memory-corruption bugs (CVE-2025-23310, CVE-2025-23311) in Triton's HTTP request handling, surfaced via static analysis plus chunked-encoding probing. The reminder: production inference servers are still C/C++ network services with all the attendant historical bug classes, and authentication is off by default.
A multi-stage vulnerability chain in the Triton Python backend, starting from a minor information leak about shared-memory region names and escalating to unauthenticated RCE. Useful as a case study in chaining low-severity primitives into a takeover.
Reference for CVE-2024-34359 (the chat-template Jinja RCE in llama-cpp-python) and the broader question of when loading a GGUF model can lead to server-side template injection. The case study for why loader extensions need the same threat-modeling rigor as the loader itself.
Modeling Okta in BloodHound Enterprise alongside AD, Entra, GitHub. The argument: identity boundaries between platforms are where attack paths actually live, and treating any single platform in isolation underrepresents real risk.
Long-form correction of decades of incorrect documentation around AD's AdminSDHolder mechanism. The kind of historical-grounding piece that's useful before doing anything privileged-account-related on AD engagements.
The AD CS paper that opened up the modern wave of ADCS work. Still the cleanest framing of what a tooling-up problem looks like before any tools exist.
Infrastructure-side measurements of AI adoption: 81% of cloud environments use managed AI services, 90% run self-hosted, 80% have MCP servers. The framing — AI as accumulated, not adopted — is a useful governance lens.
An authenticated git push achieves RCE on GitHub's backend through a delimiter-based internal protocol. Notable also as one of the first critical vulnerabilities the team credits to AI-augmented reverse engineering.
Tree-sitter plus rustworkx, packaged as Claude Code skills for blast-radius and taint-propagation analysis. Useful as a reference for how graph reasoning composes with LLM agents in a security workflow.
Monthly digest
Monthly digest pending. First issue when there's something worth saying.