Home/Signal

Signal

A reading list and a monthly digest.

Curated work shaping how I think about offensive security, ML supply chain, and the seams between them. The reading list is evergreen; the digest goes out monthly when there's something worth saying.

Reading list

Curated · 21 entries · 6 categories

ML supply chain & steganography

Provably Secure Steganography Based on List Decoding

Pang & Bai (Tsinghua University)

Pushes provably-secure linguistic steganography toward higher embedding capacity by maintaining a list of candidate decodings rather than a single one. Directly relevant to the entropy-budget question in any LLM-mediated covert-channel design.

arxiv.org · April 2026

Stealthy Backdoor Attacks against LLMs Based on Natural Style Triggers (BadStyle)

Wei et al.

Style-level (not token- or syntax-level) backdoor triggers, generated by an LLM as a poisoned-sample synthesizer. Adds an auxiliary target loss to stabilize payload injection during fine-tuning. Evaluated against seven model families.

arxiv.org · April 2026

EvilModel: Hiding Malware Inside of Neural Network Models

Wang, Liu, Cui

The canonical reference for byte-level steganography in float32 weight tensors. Explicitly defers the decoder to a separately-deployed loader, which is the substantive limitation when read against a channel/decoder/substrate framework.

arxiv.org · 2021

BadNets: Identifying Vulnerabilities in the ML Supply Chain

Gu, Dolan-Gavitt, Garg

Origin paper for trigger-based co-trained backdoors. The decoder and the channel are baked into the network's weights together, which is why detection has to be behavioral rather than static.

arxiv.org · 2017

LLM red-teaming & jailbreaks

Adaptive Instruction Composition for Automated LLM Red-Teaming

Zymet et al. (Capital One AI Foundations)

Replaces random combination of crowdsourced jailbreak ingredients with a contextual-bandit learner that scores combinations based on prior success. Roughly 2,200-parameter bandit on top of SBERT embeddings. Transfers across models without retraining.

arxiv.org · April 2026

Transient Turn Injection: Stateless Multi-Turn Vulnerabilities

Rayhan & Jahan

Distributes adversarial intent across stateless turns, evading moderation that evaluates each turn independently. Notable for showing that the threat model "single-turn safety classifier" is incomplete against an attacker LLM operating across sessions.

arxiv.org · April 2026

Persistent Pre-Training Poisoning of LLMs

Zhang, Rando, Evtimov, Carlini, Tramèr et al. (Meta · ETH Zürich · CMU · Google DeepMind)

Poisoning 0.1% of pre-training data is enough for three of four backdoor objectives (DoS, belief manipulation, jailbreaking) to survive post-training. DoS persists at 0.001%. The supply-chain layer the threat model has to start at.

arxiv.org · 2024

MCP & agentic security

MCP Pitfall Lab: Developer Pitfalls in MCP Tool Server Security

Hao & Tan

Six-class pitfall taxonomy (P1–P6) split into statically-checkable (Tier-1) and trace/dataflow-dependent (Tier-2) classes. Three workflow challenges (email, document, crypto) with hardened-vs-baseline server pairs and three attack families: tool-metadata poisoning, puppet servers, image-to-tool chains.

arxiv.org · April 2026

Beyond the Protocol: Attack Vectors in the MCP Ecosystem

Song et al.

First end-to-end empirical evaluation of attacks against MCP. Four attack categories: tool poisoning, puppet attacks, rug pull, and exploitation via malicious external resources. Useful as the lay-of-the-land paper before any MCP-specific work.

arxiv.org · 2025

Trivial Trojans: Cross-Tool Exfiltration via Minimal MCP Servers

Croce & South

Concrete demonstration of cross-server data exfiltration in MCP. The barrier-to-entry argument matters: this is not a sophisticated attack class, which is the point.

arxiv.org · July 2025

Threat modeling and prompt injection in Comet

Trail of Bits

ML-centered threat modeling applied to an agentic browser. Four prompt-injection techniques against the AI assistant, all chained to exfiltrate Gmail data. The methodology — TRAIL — is more transferable than any individual finding.

blog.trailofbits.com · February 2026

Inference infrastructure & ML platforms

mcp-run-python: lack of isolation, MCP takeover, Deno SSRF

Natan Nehorai (JFrog)

Two CVEs (CVE-2026-25905, CVE-2026-25904) in a popular MCP server template. The class of bug is a useful pattern: trusting that a Deno sandbox plus a containerized python runner will hold under MCP-style invocation.

research.jfrog.com · February 2026

Uncovering memory corruption in NVIDIA Triton (as a new hire)

Will Vandevanter (Trail of Bits)

Two remotely-exploitable memory-corruption bugs (CVE-2025-23310, CVE-2025-23311) in Triton's HTTP request handling, surfaced via static analysis plus chunked-encoding probing. The reminder: production inference servers are still C/C++ network services with all the attendant historical bug classes, and authentication is off by default.

blog.trailofbits.com · August 2025

Breaking NVIDIA Triton: CVE-2025-23319 vulnerability chain to RCE

Wiz Research

A multi-stage vulnerability chain in the Triton Python backend, starting from a minor information leak about shared-memory region names and escalating to unauthenticated RCE. Useful as a case study in chaining low-severity primitives into a takeover.

wiz.io · August 2025

GGUF-SSTI: Llama-Drama and the Jinja template attack surface

JFrog Security Research

Reference for CVE-2024-34359 (the chat-template Jinja RCE in llama-cpp-python) and the broader question of when loading a GGUF model can lead to server-side template injection. The case study for why loader extensions need the same threat-modeling rigor as the loader itself.

research.jfrog.com · 2024

Identity, AD, and lateral movement

Attack Paths Don't Stop at Identity Providers

Jared Atkinson (SpecterOps)

Modeling Okta in BloodHound Enterprise alongside AD, Entra, GitHub. The argument: identity boundaries between platforms are where attack paths actually live, and treating any single platform in isolation underrepresents real risk.

specterops.io · March 2026

AdminSDHolder: Misconceptions, Misconfigurations, and Myths

Jim Sykora (SpecterOps)

Long-form correction of decades of incorrect documentation around AD's AdminSDHolder mechanism. The kind of historical-grounding piece that's useful before doing anything privileged-account-related on AD engagements.

specterops.io · October 2025

Certified Pre-Owned

Will Schroeder & Lee Christensen

The AD CS paper that opened up the modern wave of ADCS work. Still the cleanest framing of what a tooling-up problem looks like before any tools exist.

specterops.io · 2021

AI x security writing

State of AI in the Cloud 2026

Wiz Research

Infrastructure-side measurements of AI adoption: 81% of cloud environments use managed AI services, 90% run self-hosted, 80% have MCP servers. The framing — AI as accumulated, not adopted — is a useful governance lens.

wiz.io · April 2026

GitHub RCE via X-Stat header injection (CVE-2026-3854)

Wiz Research

An authenticated git push achieves RCE on GitHub's backend through a delimiter-based internal protocol. Notable also as one of the first critical vulnerabilities the team credits to AI-augmented reverse engineering.

wiz.io · April 2026

Trailmark: turning code into security-analysis graphs

Trail of Bits

Tree-sitter plus rustworkx, packaged as Claude Code skills for blast-radius and taint-propagation analysis. Useful as a reference for how graph reasoning composes with LLM agents in a security workflow.

blog.trailofbits.com · April 2026

Monthly digest

First of the month

Monthly digest pending. First issue when there's something worth saying.