Sanctum
Technology should empower people, not control them.
← Work
Research · 2025–2026

LLM behavioral fingerprinting

Measuring the systematic behavioral signatures of language models to expose hidden bias and multi-actor training influence that standard safety evaluations miss.

AI securityModel assuranceSupply-chainResearch

What it is

Language models behave differently under different conditions, and those differences are measurable. Just as a forensic examiner identifies a subject by signature traits, this research extracts behavioral signatures from a model — response patterns, output quality, latency, language choices — to tell which model sits behind an unknown endpoint and to surface biases that go unadvertised.

Why it matters

Organizations integrating LLMs into infrastructure have two blind spots: they cannot reliably detect concealed bias, and they have no visibility into whether a model's training reflects conflicting state or corporate agendas. The research demonstrates asymmetries that operate below the threshold of conventional safety evaluation — the same model serving materially different quality depending on who appears to be asking. For procurement and supply-chain risk in AI-dependent systems, that is the whole game.

Approach

Controlled adversarial prompting across contexts to reveal guardrail asymmetries beyond simple refusal; multi-agent role-play to observe behavior under pressure; extraction of behavioral markers; and cross-architecture convergence analysis across multiple models to separate provider-specific signatures from emergent ones.

Documented as reproducible experiments with full transcripts. Exploratory in scope, rigorous within it — positioned as foundational open research.