Deepfake technology has reached the point where individual video frames of synthetic faces are visually indistinguishable from genuine photographs by both humans and many detection systems. The weakness of deepfake generators is not in individual frames — it is in the temporal relationships between frames. Genuine facial expressions follow physiologically-constrained timing patterns that current diffusion models and GANs cannot faithfully replicate. This is the basis of EchoDepth's POKERFACE deepfake detection module.
Why Individual-Frame Detection Is No Longer Sufficient
Early deepfake detection methods focused on artefacts visible in individual frames: inconsistent lighting, blurring at face boundaries, texture anomalies, and eye blinking anomalies. As deepfake generation technology improved — particularly with the adoption of diffusion models — many of these frame-level artefacts became harder to detect reliably.
Research from multiple academic groups including MIT CSAIL has documented the rapid reduction in frame-level detection accuracy as generation models improve. A detection method that achieves 95% accuracy against 2022-era deepfakes may perform substantially worse against 2025-era generations.
Temporal analysis addresses this limitation by examining properties that are constrained not by the sophistication of the generation model, but by the underlying physiology of the human face — constraints that AI generation models do not currently model.
The Neuromuscular Constraints Deepfakes Cannot Fake
Genuine facial expressions are produced by specific muscles, and those muscles have characteristic activation properties: onset rates, peak durations, offset rates, and fatigue patterns. The Facial Action Coding System (FACS) catalogues these at the level of individual Action Units.
A genuine Duchenne smile (AU6 + AU12) has characteristic temporal properties: AU12 (lip corner puller) activates faster than AU6 (cheek raiser); the peak is held for a duration correlated with the social context; and the offset follows a characteristic decay pattern. These timing relationships are governed by neuromuscular physiology — the speed at which specific muscle fibres contract and relax, the interplay between agonist and antagonist muscles, and the modulation of expression intensity over time.
Current deepfake generators operate primarily in the spatial domain — they produce spatially plausible frames. They do not model the temporal dynamics of individual AU activations with physiological accuracy. The result is that even visually plausible deepfake video tends to show AU activation patterns with temporal signatures that deviate from genuine expressions in measurable ways: abrupt onset/offset, implausible AU co-activation patterns, and temporal jitter in AU intensity that differs from genuine neuromuscular activation.
"Temporal analysis of facial action units provides detection robustness that is inherently tied to the physiology of the human face rather than to the specific failure modes of current generation models — making it more generalisable to future generation technologies."
— EchoDepth POKERFACE Technical Documentation, 2026How POKERFACE Implements Temporal AU Analysis
EchoDepth's POKERFACE module processes video frame sequences rather than individual frames. For each sequence, it extracts 44 FACS-compliant Action Units per frame, builds temporal activation profiles for each AU, and analyses the resulting time-series against known distributions of genuine expression temporal dynamics.
Key detection signals include:
- AU onset/offset rate deviation: abrupt or implausibly smooth activation changes that differ from genuine muscle contraction dynamics
- AU co-activation implausibility: combinations of AUs that are physiologically unusual — for example, AU12 and AU20 (lip stretcher) occurring simultaneously, which is physiologically unusual in genuine expressions
- AU intensity jitter: the characteristic micro-variations in muscle activation intensity that reflect genuine neuromuscular control, which are often absent in generated video
- Inter-AU timing relationships: the relative timing of different AU activations in compound expressions, which follow consistent physiological patterns in genuine faces
- Blink pattern analysis: blinking frequency and duration patterns that deviate from genuine baseline
POKERFACE produces a per-sequence confidence score with per-AU temporal deviation flagging — output that enables an analyst to understand not just whether a video is flagged, but which specific temporal patterns triggered the detection.
Applications in Defence and Intelligence
The primary defence and intelligence applications for deepfake detection include: verification of source video in intelligence assessments, screening of communications and video evidence for synthetic generation, counter-OSINT analysis, and verification of remotely-presented identities in credentialing and briefing contexts.
POKERFACE deploys fully on-premise with no cloud dependency — suitable for SCIF and air-gapped environments. Output is structured JSON with ISO 8601 timestamps, confidence weightings, and per-AU temporal deviation data — compatible with intelligence document formats and legal proceedings.
The system integrates with cyber security workflows via REST API and WebSocket, enabling automated deepfake screening of incoming video materials as part of existing SIEM and SOAR pipelines.
Deepfake detection for defence, intelligence, and cyber security
POKERFACE — AU temporal coherence analysis. SCIF-compatible. Structured evidential output. Integrates with SIEM and SOAR platforms.