PA.
HomeProjectsTech StackBlog
Resume
PA.

Senior AI/ML Engineer · KYC/AML · Africa

HomeProjectsTech StackBlog

© 2026 Patrick Attankurugu. Built with Next.js.

Back to home
Computer Vision · KYC

Deepfake Detection in Production KYC: Lessons from 100K+ Verifications

January 2025·9 min read

Research papers report 99%+ deepfake detection accuracy. In production KYC, with real users, variable lighting, cheap smartphone cameras, and adversaries who adapt to your defenses, the reality is much harder. Here is what we learned from building and running a deepfake detection system across 100,000+ identity verifications.

100K+
Verifications Processed
96.1%
Overall Detection Rate
0.3%
False Rejection Rate
<830ms
Total Pipeline Latency

The Research-Production Gap

Academic benchmarks evaluate deepfake detection under controlled conditions: consistent lighting, high-resolution images, known generative models. The standard datasets (FaceForensics++, DFDC, Celeb-DF) are useful for research but do not represent what a production KYC system encounters daily.

Our verification pipeline processes selfie videos from whatever device the customer happens to own. That means 2-megapixel front cameras on $50 Android phones, fluorescent office lighting, direct sunlight washing out half the face, cracked screens introducing artifacts, and network conditions that compress video to barely recognizable quality. Every one of these conditions degrades detection accuracy in ways that benchmark datasets never capture.

Research vs. Production: The Accuracy Gap

Research benchmarks (controlled)
99.2%
High-end smartphones, good lighting
97.8%
Mid-range devices, indoor lighting
96.1%
Low-end devices, poor lighting
91.4%
Compressed video, poor network
87.2%

What Attacks Actually Look Like

Before building the detection system, I expected sophisticated AI-generated deepfakes to be the primary threat. The reality was more mundane and more varied. Printed photos and screen replays account for over 70% of attacks. Sophisticated deepfakes are growing but still represent a minority.

Attack Types We See in Production (Click to Explore)

99.7%
Detection Rate
42%
% of Attacks
Low
Difficulty

Customer holds a printed photo in front of the camera. Detected by liveness checks (blink detection, head turn) and paper texture analysis.

The distribution matters for system design. Optimizing exclusively for cutting-edge GAN detection (which is what most research focuses on) would miss the majority of actual fraud attempts. Our architecture needed to handle everything from a photo taped to a cardboard cutout to a real-time face swap running on a GPU.

The Multi-Layered Detection Architecture

We designed the detection pipeline as a cascade: cheaper, faster checks run first and eliminate the easy cases. More expensive analysis only runs when earlier stages are inconclusive. This keeps average latency under 830ms while applying the full analytical battery to genuinely ambiguous cases.

Detection Pipeline Stages (Click Each Stage)

Active challenges (blink, smile, head turn) plus passive signals (eye reflection consistency, micro-saccades, skin blood flow). This catches 73% of all attacks before deeper analysis runs.

3D depth estimationBlink detectionChallenge-responserPPG blood flow
Average Latency:200ms

Cascade Architecture: Early Exit Design

1
Liveness Checkexits 73% of attacks
Printed photos, basic screen replays
2
Texture Analysisexits 18% of attacks
Sophisticated replays, paper masks
3
Temporal Analysisexits 5% of attacks
Real-time face swaps
4
Frequency Domainexits 2.5% of attacks
GAN-generated, synthetic identities
5
Human Reviewexits 1.5% of attacks
Ambiguous cases below confidence threshold

The Skin Tone Problem

This deserves its own section because it is the single most underreported challenge in production face analysis for African markets. Most face analysis models are trained predominantly on lighter skin tones. The consequence is measurably worse performance on darker skin. In a continent where the vast majority of users have dark skin, this is not an edge case. It is the primary use case.

Our initial liveness detection model used rPPG (remote photoplethysmography) to detect blood flow patterns beneath the skin. It worked well for lighter complexions but degraded significantly for darker skin tones. The subtle color variations that indicate blood flow are harder to detect when melanin absorption is higher.

The fix required three changes: retraining on a dataset that properly represented African demographics (we collected and annotated 15,000 additional sessions), switching the rPPG analysis from RGB to the chrominance-based CHROM method which is more robust to melanin variation, and adding NIR (near-infrared) sensing for devices that support it.

False Rejection Rate by Skin Tone (Fitzpatrick Scale)

Before Retraining
I-II (Light)
0.2%
III-IV (Medium)
0.5%
V-VI (Dark)
4.7%
After Retraining
I-II (Light)
0.2%
III-IV (Medium)
0.3%
V-VI (Dark)
0.4%

A 10x reduction in false rejections for dark skin tones after targeted retraining

Adversarial Adaptation

Fraudsters learn. Within weeks of deploying a new detection method, we see adaptation. When we added blink detection, attackers started using videos instead of photos. When we caught screen replays via moire pattern analysis, they switched to higher-quality displays. When we added 3D depth estimation, they started using printed photos curved around cylindrical objects.

This arms race is the fundamental reason a single detection method is never sufficient. The multi-layered cascade architecture means that even when adversaries defeat one layer, subsequent layers catch the attempt. Because each layer uses fundamentally different signal types (geometric, textural, temporal, spectral), defeating all four simultaneously is substantially harder than defeating any one.

Architecture Decisions That Mattered

On-device preprocessing, server-side detection

Face detection and quality assessment run on the device. Full deepfake analysis runs server-side where we control the compute environment. This keeps the client lightweight while maintaining detection accuracy.

Cascade over ensemble

An ensemble (running all models and voting) would give slightly better accuracy but at 3x the latency. The cascade design prioritizes speed for the common case (legitimate users) and reserves full analysis for suspicious cases.

Confidence scores, not binary decisions

The pipeline outputs a continuous confidence score, not a pass/fail. A $50 mobile money transfer might accept lower confidence than a $50,000 wire transfer. Risk-proportionate verification.

Continuous model retraining

Every confirmed fraud attempt becomes training data. The model retrains weekly with the latest attack patterns, ensuring the detection pipeline evolves alongside the threat landscape.

What Comes Next

The deepfake threat is accelerating. Open-source face swap tools are becoming trivially easy to use. Real-time face generation quality improves every few months. We are exploring three directions: multimodal verification combining face analysis with voice biometrics and behavioral signals, federated detection models trained across institutions without sharing raw biometrics, and hardware-level attestation using device secure enclaves to verify camera feed integrity before it reaches our pipeline.

The goal is not to build an unbreakable system. That does not exist. The goal is to make fraud economically irrational: raise the cost and complexity of a successful attack beyond the expected payoff. For the overwhelming majority of attempted fraud, a well-designed multi-layered pipeline achieves exactly that.

PA
Patrick Attankurugu
Senior AI/ML Engineer specializing in computer vision for identity verification and deepfake detection across Africa.