A security screening tool has two ways to fail. It can incorrectly identify a truthful person as deceptive — the false positive. Or it can incorrectly clear a deceptive person as truthful — the false negative. Both failure modes have operational consequences in defence contexts. False positives damage careers and erode trust in the screening process; false negatives allow threats to remain undetected. The polygraph's published accuracy record on both measures is the starting point for any rational procurement decision.
What false positive and false negative rates mean in security screening
In diagnostic testing, sensitivity measures the true positive rate — how often a genuinely deceptive subject is correctly identified. Specificity measures the true negative rate — how often a genuinely truthful subject is correctly cleared. Their complements are the false negative rate and the false positive rate respectively.
For a security screening tool, both matter. But they matter differently depending on the deployment context and the consequences of each error type. In national security contexts, a false negative that clears an active intelligence asset has consequences measured in classified data, agent lives, and strategic advantage. A false positive that flags an innocent officer for further investigation has consequences measured in careers, legal costs, and institutional trust.
The polygraph forces a tension between these error types: attempts to reduce false negatives (by making the test more sensitive to stress responses) tend to increase false positives (because genuine anxiety in innocent subjects also elevates the same physiological signals the test measures). There is no threshold setting that resolves this fundamental problem, because the polygraph measures a proxy — physiological arousal — rather than deception itself.
The published accuracy figures
The most comprehensive independent review of polygraph accuracy is the 2003 National Academy of Sciences report The Polygraph and Lie Detection, commissioned by the US Department of Energy. The NAS reviewed 57 studies meeting their quality criteria.
Key findings from the NAS review:
- Median accuracy across studies was an AUC (area under the ROC curve) of 0.86 — better than chance but well below what would be required for high-stakes individual personnel decisions
- The best-performing studies were conducted under conditions more favourable to the polygraph than typical field use — staged mock crimes, cooperative subjects, controlled environments
- No studies provided direct evidence on the polygraph's accuracy for the specific task of personnel security screening
- The NAS concluded that the polygraph was "not useful for employee security screening" at the national security level
For the Control Question Technique (CQT) — the format used in most security screening — the NAS found laboratory false positive rates of approximately 10–20% and false negative rates of 10–20%. Under real field conditions, particularly when examinees are aware of the technique and have prepared, false negative rates are demonstrably higher.
"We conclude that the polygraph is not a useful diagnostic tool and that its use in employee security screening should be discontinued."
— National Academy of Sciences, The Polygraph and Lie Detection (2003)The false positive problem: wrongly flagging the innocent
A 15% false positive rate applied to a security screening population of 1,000 produces 150 false deceptive outcomes among truthful subjects. Each of these represents a person whose career, clearance, and reputation are put at risk on the basis of a physiological stress response that the polygraph cannot distinguish from actual deception.
The polygraph measures peripheral nervous system arousal — respiratory rate, skin conductance, blood pressure, pulse. These signals elevate in response to anxiety, cognitive load, emotional significance, and anticipatory stress, as well as in response to deception. A truthful subject who is anxious about the screening process — aware of the professional consequences of a false result — will show elevated arousal at questions they are answering honestly. The polygraph has no mechanism to distinguish this from deception-related arousal.
Specific populations are known to produce systematically elevated false positive rates: individuals with anxiety disorders, individuals from cultures where authority-based interrogation carries particular historical associations, and individuals who have been briefed on the test process and are actively trying to pass — a cognitive effort that itself elevates arousal.
The false negative problem: clearing the deceptive
The false negative rate is harder to measure in real-world conditions, because ground truth — actual deception status — is rarely established definitively. Laboratory studies using mock crime paradigms provide estimates in the 10–20% range. Documented real-world cases provide the more sobering evidence.
Aldrich Ames, who passed two CIA polygraph examinations while conducting sustained espionage for the Soviet Union over nine years, reportedly received coaching from his KGB handlers on countermeasure techniques — specifically, relaxation and self-justification strategies designed to reduce the differential arousal response at relevant questions. Robert Hanssen, who compromised US intelligence operations over 22 years, also passed polygraph examinations during this period.
These cases are not anomalies. They reflect a systematic vulnerability: a test that measures physiological arousal as a proxy for deception can be defeated by any technique that either reduces arousal at relevant questions or increases arousal at control questions — and published literature on countermeasure techniques is readily available. The NAS review explicitly noted that the scientific literature provides no evidence that polygraph examiners can reliably detect countermeasure use.
UK defence procurement context
The UK's adoption of polygraph screening for specific National Security Vetting purposes — permitted under the Counter-Terrorism and Security Act 2015 and expanded to broader categories of security-cleared personnel — took place against the background of this accuracy evidence. The full scientific review of UK polygraph use examines the procurement basis in detail.
Two procurement considerations follow from the error rate evidence. First, any alternative assessment tool should be evaluated against the same two error metrics — false positive rate and false negative rate — rather than against the assumption that the polygraph provides a reliable baseline. Second, tools that measure direct behavioural and emotional signals rather than peripheral physiological proxies for arousal address the underlying measurement problem rather than working around it.
What FACS-grounded assessment addresses differently
EchoDepth's credibility assessment capability does not rely on a single physiological channel as a proxy for deception. Instead, it measures 44 FACS Action Units per frame — the direct muscular correlates of emotional state — and represents output as VAD dimensional scores (Valence, Arousal, Dominance) against individual baselines.
This addresses two specific failure modes of the polygraph. First, by using individual baseline deviation scoring rather than population-level thresholds, it reduces the false positive rate from anxiety-related arousal in innocent subjects — arousal relative to that individual's established baseline is the signal, not arousal relative to a population norm. Second, by measuring facial muscular activation directly rather than peripheral physiological proxies, it reduces susceptibility to relaxation-based countermeasures that work by suppressing peripheral nervous system arousal without suppressing the facial expressions that accompany emotional states.
EchoDepth produces per-finding structured output — AU evidence, VAD coordinates, confidence weights, baseline deviation magnitude — that supports review, escalation, and challenge. The output is documented evidence, not a binary deceptive/truthful label. See the full EchoDefence vs polygraph comparison for a structured side-by-side evaluation.
A credibility assessment approach with documented accuracy methodology
Per-finding AU evidence. VAD baseline deviation scoring. Individual-calibrated thresholds. SCIF-deployable. UK data residency.