AML · Machine Learning

Why Rule-Based AML Systems Fail — And How Machine Learning Changes Everything

March 2025·10 min read

Most financial institutions still rely on static rules and hard-coded thresholds to detect money laundering. The result? Thousands of false alerts, analyst fatigue, and sophisticated laundering patterns slipping through undetected. This post examines why — and what a machine learning approach looks like in practice.

The Scale of the Problem

Money laundering is not a fringe concern. The United Nations Office on Drugs and Crime estimates that between 2% and 5% of global GDP — roughly $800 billion to $2 trillion — is laundered annually. For Africa specifically, illicit financial flows drain an estimated $88.6 billion each year, undermining economic development and eroding public trust in financial systems.

Financial institutions are legally required to detect and report suspicious activity. The penalties for failure are severe: billions in fines, criminal prosecution of compliance officers, and loss of banking licenses. So banks invest heavily in transaction monitoring systems. And yet, the results are consistently disappointing.

The AML Detection Gap

95%+

of alerts are false positives

$2T

laundered annually worldwide

<1%

of illicit funds actually seized

That last number is the one that should trouble us most. Despite all the investment, all the compliance headcount, and all the technology spend, the global financial system intercepts less than 1% of laundered money. Something is fundamentally broken, and it starts with how detection systems are built.

How Traditional Rule-Based Systems Work

The architecture of a conventional AML transaction monitoring system is straightforward. Compliance teams, often working with vendors, define a set of rules — sometimes called scenarios or typologies — that describe suspicious behavior. Each rule has a threshold. When a transaction or pattern of transactions crosses a threshold, the system generates an alert.

Traditional Rule-Based Architecture

Transaction Data Ingested

Batch or real-time feed from core banking

Rules Engine Evaluates

Static thresholds applied: amount > X, frequency > Y, country = Z

Alert Generated

Threshold breach triggers an alert in the case management queue

Analyst Reviews

Human investigator manually reviews and decides: escalate or close

A typical rule might look like this: “Flag any single cash transaction exceeding $10,000.” Another: “Alert when a customer makes more than five international wire transfers in a 30-day window.” More sophisticated rules might look at velocity changes or peer group comparisons, but the logic remains the same — a fixed boundary triggers a binary decision.

This approach made sense in the 1990s when the Bank Secrecy Act was being modernized and computational resources were limited. Rules are transparent, easy to explain to regulators, and deterministic. You can trace exactly why an alert fired. But three decades later, these systems are buckling under their own weight.

Five Reasons Rule-Based Systems Fail

1. The False Positive Catastrophe

The single biggest failure of rule-based AML is the false positive rate. Industry data consistently shows that 90-98% of alerts generated by conventional systems are false positives. A mid-sized bank might generate 10,000 alerts per month. If 95% are false, that means 9,500 alerts are wasted analyst hours — investigators manually pulling transaction histories, reviewing account documentation, and writing disposition notes for cases that were never suspicious in the first place.

What 95% False Positives Actually Looks Like

9,500False

500Real

Out of 10,000 monthly alerts at a typical mid-sized bank

The downstream effects compound. Analysts develop “alert fatigue” — a well-documented phenomenon where investigators become desensitized to alerts because nearly everything they review turns out to be legitimate. When actual suspicious activity does surface, it gets the same cursory treatment as the thousands of false alarms that preceded it.

2. Static Thresholds in a Dynamic World

Money launderers are not static adversaries. They observe, adapt, and deliberately structure their activity to stay just beneath known thresholds. This practice, called “structuring” or “smurfing,” is as old as AML regulation itself. If the reporting threshold is $10,000, deposits start appearing at $9,500 or $9,800.

But the problem goes deeper than obvious threshold avoidance. Legitimate customer behavior varies enormously. A small business owner depositing $12,000 in weekend cash sales is not the same as a newly opened personal account receiving the same amount from an unknown source. A rule that treats these identically — because both cross the same threshold — is not detecting suspicion. It is measuring arithmetic.

Rules cannot adapt to seasonal variations, economic shifts, or individual customer context without manual recalibration. Every time the compliance team tunes a threshold, they face a tradeoff: lower it and drown in even more false positives; raise it and miss genuinely suspicious activity.

The Threshold Dilemma

↓

Lower Threshold

Catch more crime

10x more false positives

Current Threshold

95% false positive rate

Known gaps in coverage

↑

Raise Threshold

Fewer false positives

Miss real laundering

3. Rules Cannot See Networks

Modern money laundering is rarely a single-transaction event. It is a network activity — layering funds through multiple accounts, entities, and jurisdictions to obscure the origin. Consider a common pattern: funds move from Account A to Account B, then from B to Accounts C and D, which are held by shell companies registered in different countries. C and D then converge into Account E, which is held by a seemingly unrelated individual.

No individual transaction in this chain is necessarily suspicious. Each might be a modest amount, sent between ostensibly unrelated parties. A rule-based system evaluating each transaction in isolation — or even within the context of a single account — will miss this entirely. The suspicion only becomes apparent when you observe the network structure.

Layered Laundering: What Rules Miss

Illicit Source (A)

↓

Account B

↙↘

Shell Co. (C)

Shell Co. (D)

↘↙

“Clean” Account (E)

Placement Layering Integration

Rule-based systems are fundamentally transaction-centric. They lack the ability to reason about relationships between accounts, trace fund flows across multiple hops, or identify convergence and divergence patterns that indicate layering schemes.

4. The Maintenance Burden

A mature AML program at a large institution might have 200 to 500 individual rules, each with its own thresholds, customer segment filters, and lookback periods. Every rule needs periodic tuning. Every new regulation, product launch, or emerging typology requires new rules to be written, tested, and deployed.

This creates a maintenance burden that grows superlinearly. Rules interact with each other in unpredictable ways — adjusting one rule's threshold can cascade alert volumes across other rules. Testing rule changes requires months of parallel running and backtesting. Meanwhile, regulators expect institutions to respond to emerging threats within weeks, not quarters.

The result is rule sprawl. Over time, institutions accumulate layers of rules that nobody fully understands, overlapping and sometimes contradicting each other. Decommissioning a rule feels risky — what if it was catching something important? — so rules tend to accumulate but rarely retire.

5. Inability to Learn from Investigation Outcomes

Perhaps the most damning limitation is that rule-based systems cannot learn. An analyst might close 9,500 false positive cases in a month, documenting in each case why the activity was legitimate. That institutional knowledge — the patterns of benign behavior, the contextual signals that distinguish real suspicion from noise — is locked in case narratives and analyst intuition. The rules engine never sees it.

The system that generated 95% false positives last month will generate 95% false positives next month. The only feedback loop is manual rule tuning, which is slow, expensive, and inherently limited by human ability to synthesize patterns across thousands of cases.

How Machine Learning Changes the Game

Machine learning approaches AML detection from a fundamentally different angle. Rather than encoding human-defined rules, ML systems learn patterns directly from data — including, critically, from the outcomes of past investigations. The shift is from “define what suspicious looks like” to “show the model what suspicious looks like and let it generalize.”

This is not about replacing compliance judgment. It is about augmenting it with pattern recognition capabilities that exceed what any team of analysts can achieve manually.

Rule-Based vs. Machine Learning: A Direct Comparison

Dimension	Rules	ML
False positive rate	90-98%	30-70% (model-dependent)
Adaptation	Manual rule tuning (weeks/months)	Continuous retraining (days/hours)
Network detection	Single-account, transaction-level	Cross-account, graph-aware
Feature space	10-20 hand-picked variables	100-500+ engineered features
Learning from outcomes	None	Explicit feedback loop
Explainability	Fully transparent	Requires SHAP/LIME interpretation

Supervised Learning: Teaching Models from Analyst Decisions

The most direct application of ML to AML is supervised learning. The idea is intuitive: take historical alerts, label each one with its investigation outcome (true positive or false positive), and train a classifier to distinguish between them.

The feature set matters enormously. Rather than the handful of variables that define a rule (amount, frequency, country), an ML model can incorporate hundreds of features: transaction velocity, time-of-day patterns, counterparty diversity, deviation from peer group behavior, account tenure, product mix, geographic entropy of fund flows, and many more. The model learns which combinations of features are predictive of genuine suspicion — combinations that no human analyst could manually specify.

Gradient-boosted tree models (XGBoost, LightGBM) have proven particularly effective in this domain. They handle mixed feature types well, are relatively robust to noisy labels, and provide feature importance rankings that aid explainability. In my own work building transaction monitoring systems, I have consistently seen these models reduce false positive rates by 60-80% while maintaining or improving true positive detection.

Supervised ML Pipeline for AML

📊

Historical Alerts

Labeled outcomes from past investigations

⚙

Feature Engineering

500+ behavioral, transactional, and network features

🧠

Model Training

XGBoost / LightGBM with class balancing

🎯

Risk Scoring

Continuous probability score per customer/transaction

✅

Smart Alerting

Dynamic thresholds, ranked queues, auto-close low-risk

Unsupervised Learning: Finding What You Didn't Know to Look For

Supervised learning excels at improving detection of known patterns. But what about entirely novel laundering methods — the typologies that have never been seen before and therefore have no labeled examples?

This is where unsupervised anomaly detection fills a critical gap. Instead of learning “what does suspicious look like,” unsupervised models learn “what does normal look like” and flag deviations. Autoencoders, isolation forests, and clustering-based approaches can identify customers whose behavior suddenly diverges from their established baseline or from their peer group — without needing any labeled data.

The combination is powerful. Supervised models handle the known typologies with high precision. Unsupervised models provide a safety net for emerging threats. Together, they cover a far larger threat surface than rules alone.

Dual-Model Detection Architecture

Supervised Model

✓Known typologies (structuring, layering, etc.)
✓Learns from analyst investigation outcomes
✓High precision, low false positives
✓Explainable via feature importance (SHAP)

Unsupervised Model

✓Novel and emerging patterns
✓No labeled data required
✓Behavioral baseline deviation detection
✓Catches what rules never anticipated

Combined Output → Ranked Alert Queue with Contextual Explanations

Graph Neural Networks: Seeing the Bigger Picture

The most exciting development in AML detection is the application of graph neural networks (GNNs) to transaction data. Where traditional models analyze customers and transactions as independent observations, GNNs operate on the transaction graph itself — the web of relationships between accounts, entities, and financial flows.

A GNN can learn that Account E is suspicious not because of any single transaction it received, but because the accounts two or three hops upstream exhibit characteristics consistent with layering. It can detect circular flows, rapid consolidation patterns, and community structures that suggest coordinated laundering networks.

We used graph-based approaches in building the AfricaPEP system to map beneficial ownership networks across 54 countries. The network perspective revealed connections between politically exposed persons that would have been invisible in tabular data — shared directorships, familial connections through intermediary entities, and cross-border corporate structures designed to obscure the ultimate beneficial owner.

The Practical Reality: What Implementation Looks Like

Deploying ML for AML is not as simple as training a model and connecting it to your alert pipeline. There are real challenges that any institution needs to address.

The Label Problem

Supervised models need labeled data, and AML labels are inherently noisy. A “false positive” label means the analyst did not find enough evidence of suspicion within their investigation scope — it does not definitively mean the activity was legitimate. True positives are rare (often less than 2% of alerts), creating severe class imbalance. Confirmed money-laundering convictions, which would be the ideal positive label, are rarer still and arrive years after the fact.

Addressing this requires careful label engineering: using SAR filings rather than convictions as the positive class, applying stratified sampling and synthetic oversampling techniques (SMOTE), and supplementing labeled data with semi-supervised approaches that leverage the large volume of unlabeled transactions.

Explainability is Non-Negotiable

Regulators do not accept “the model said so” as justification for filing or not filing a suspicious activity report. Every alert must come with a human-readable explanation of why it was generated. This is where techniques like SHAP (SHapley Additive exPlanations) become essential.

SHAP values decompose a model's prediction into the contribution of each feature. For any given alert, you can say: “This customer was flagged primarily because (1) their international transfer velocity increased 400% relative to their six-month baseline, (2) 70% of transfers went to jurisdictions rated high-risk by FATF, and (3) the funds originated from a newly added counterparty with no prior relationship.” That level of explanation satisfies both the analyst reviewing the case and the regulator examining the program.

Model Explainability: From Score to Narrative

Risk Score

0.87

Top Contributing Features (SHAP)

International transfer velocity

+0.31

High-risk jurisdiction ratio

+0.24

New counterparty concentration

+0.18

Account tenure (15 years)

-0.09

The Hybrid Approach: Rules + ML

No serious practitioner advocates ripping out all rules and going pure ML overnight. The prudent approach — and the one regulators generally endorse — is a hybrid architecture where ML models augment rather than replace the existing rule set.

In practice, this means rules continue to operate as the regulatory baseline. Certain rules are mandated by law (e.g., the $10,000 CTR filing threshold in the US). ML models then layer on top, performing two functions: scoring rule-generated alerts to prioritize genuinely suspicious ones, and independently generating alerts for patterns that rules miss.

Hybrid Architecture: The Best of Both Worlds

Rule-Based Layer

Regulatory mandates, known typologies, hard thresholds

ML Layer

Behavioral anomalies, network patterns, emerging threats

↓

ML Risk Scoring & Prioritization

All alerts scored and ranked · Low-risk auto-dispositioned · High-risk fast-tracked

↓

Analyst Investigation Queue

70-80% fewer cases · Higher conviction rate · SHAP explanations attached

The African Context: Why This Matters Here

Africa faces unique AML challenges that make the case for ML even more compelling. Mobile money volumes dwarf traditional banking transactions in many markets — Kenya's M-Pesa alone processes over 60 million transactions daily. Agent banking networks add another layer of complexity, with thousands of cash-in/cash-out points that rule-based systems struggle to monitor effectively.

Many African financial institutions operate with smaller compliance teams relative to transaction volumes. A bank processing millions of mobile money transactions cannot afford to have its five-person compliance team sifting through 10,000 false positive alerts. ML is not a nice-to-have in this context — it is the only path to effective compliance at scale.

Cross-border trade within Africa introduces additional complexity. The African Continental Free Trade Area (AfCFTA) is driving increased intra-African trade flows, but regulatory frameworks for AML vary significantly across the 54 member states. ML models that can learn jurisdiction-specific risk patterns while maintaining a continental view of cross-border flows represent the kind of adaptive intelligence that static rules simply cannot provide.

Looking Forward

The trajectory is clear. Over the next few years, we will see ML become the primary detection mechanism at forward-thinking institutions, with rules retained only where legally mandated. Several developments are accelerating this shift:

Federated learning for AML

Banks will collaboratively train models without sharing raw customer data, dramatically improving detection of cross-institutional laundering patterns.

Real-time graph analytics

As streaming graph databases mature, real-time network analysis will move from research into production, catching layering schemes as they unfold rather than days or weeks later.

Large language models for investigation

LLMs will assist analysts by summarizing case evidence, generating SAR narratives, and querying unstructured data sources — reducing investigation time from hours to minutes.

Regulatory modernization

Regulators are increasingly open to model-based approaches. FinCEN, the FCA, and MAS have all published guidance encouraging the use of innovative technology in AML programs.

The Bottom Line

Rule-based AML systems served their purpose in an era of limited computational capability and relatively simple laundering methods. That era is over. The adversaries have evolved, transaction volumes have exploded, and the complexity of global financial flows has outpaced what any set of static rules can meaningfully monitor.

Machine learning does not solve AML. No technology does. Money laundering is ultimately a human problem requiring human judgment. But ML fundamentally changes where that human judgment is applied. Instead of analysts wading through thousands of false alarms, they investigate a focused, high-quality queue of genuinely suspicious cases — each accompanied by data-driven explanations that accelerate the investigation.

The institutions that embrace this shift will not only improve their detection rates. They will do something more important: they will free their best compliance minds from mechanical alert review and redirect them toward the strategic, judgment-intensive work that actually stops financial crime. And in a continent where every dollar lost to illicit flows is a dollar that could have funded a school, a clinic, or an enterprise, getting this right matters beyond any compliance checkbox.

Patrick Attankurugu

Senior AI/ML and Backend Engineer at Agregar Technologies, building production AML, KYC, and compliance AI systems for African financial institutions.