By Jean B. Voigt, Soteria Initiative, Executive Director
Cross-sectional analyses tentatively indicate that better FATF ratings are affecting volume of SARs filed, recoveries, fines, and convictions. But how useful is that if we don’t know the true scale of illicit activities? Effectiveness scores, as measured by Immediate Outcomes (IOs) in the Basel AML Index, have stagnated or even declined according to the latest report [1]. Do increased recoveries signal improved effectiveness or merely indicate heightened criminal activity?
AML, CTF, and sanctions effectiveness measures, and in consequence fraud measures, are misleading if we lack baseline data. Imagine testing a bridge by merely checking if it can carry more weight each time without knowing its maximum capacity — this is essentially how financial crime effectiveness is often evaluated.
More SARs filed? Higher recoveries? While superficially promising, these metrics can mislead. Criminals may intentionally flood the system with false positives, effectively launching a “DDOS attack” on financial institutions, overwhelming compliance and diluting real threats. There may simply be more illicit activity, or system flaws drive the metrics.
We need to rethink testing frameworks in AML/CTF compliance, adopting rigorous, quantitative validations similar to engineering standards. Mutual evaluations and FATF Immediate Outcomes (IOs) are valuable, but they need to be complemented by systematic, controlled tests to provide a complete picture of effectiveness.
Some regulators, like the UK’s FCA most recently [2], utilize data-driven thematic reviews and examinations as part of their AML supervision methodology. Their approach typically involves detailed onsite examinations, reviews of banks’ policies and procedures, transaction sampling, and assessing the practical application of risk-based controls. However, despite these structured efforts, they still fall short of simulating comprehensive real-world challenges. Critical factors such as investigator expertise, procedural rigor in reviews, and the systems’ overall ability to extract actionable intelligence from transactions are hard to examine without looking at the interplay of systems and operational processes. Further, the nature of real-world threats is the fact that criminals utilize multiple financial institutions and non-financial market participants. Enhancing reviews such as those conducted by the FCA with more comprehensive, controlled real-world tests could significantly strengthen the effectiveness of fraud and AML compliance frameworks.
In cybersecurity, penetration testing and red-teaming have become standard practice. Financial crime prevention would greatly benefit from a similar approach. Real-time, controlled simulations could objectively measure how many simulated illicit transactions and entities are detected by AML frameworks, providing clear metrics of effectiveness. This concept has been recognized by academics more than five years ago [3], underscoring its potential to improve financial crime effectiveness testing significantly.
Unfortunately, authentic red-teaming in financial crime compliance faces significant legal barriers. Activities such as creating false identities, executing illicit transactions, or facilitating unauthorized cross-border movements of funds involving multiple institutions directly violate laws like the Bank Secrecy Act (BSA) in the United States, the Proceeds of Crime Act (POCA) in the UK, and similar AML regulations globally. Additionally, such acts may fall under criminal conspiracy statutes or anti-money laundering provisions, creating severe legal risks for institutions and individuals involved. Given the evolution of legal frameworks, one consideration might be to evaluate whether law enforcement or competent authorities would have a sufficient mandate for such testing.
To genuinely assess AML effectiveness, we must understand what should be detected and rigorously measure how well systems detect it. Recent data [4] underscore the complexity. Recoveries, a seemingly solid indicator, remain difficult to benchmark against unknown actual illicit flows.
Cross-sectional data indeed suggests better FATF scores align with improved Immediate Outcomes. Yet, recent global trends reported in the Basel AML Index briefing indicate stagnation or even decline in effectiveness scores worldwide [1].
The financial crime compliance community can and should embrace rigorous, quantitative testing alongside current evaluative approaches. Controlled simulations and automated AI agent enabled tests can swiftly highlight vulnerabilities and strengths within AML frameworks, paving the way for clearer insights and stronger defenses. Once legal challenges are resolved, red-teaming could form a central pillar of this approach.
Given the global nature and being far from a comprehensive legal review the following should serve as an indication of the legal challenges for red-teaming activities and similar highlight possible exemption ideas. The hope is that this starts further research and a broader discussion if some of the cyber-crime good-faith provisions can be applied to financial crime research.
While live red-teaming in most circumstances may still be legally challenging, the involvement of law enforcement or specific exemptions such as those available in the US show a path forward, especially when red-teaming can be performed within national borders.
The ability to asses a countries AML and fraud prevention readiness though such a quantitative testing approach is likely to uncover several layers of improvement reaching from necessary policy change to technical cooperation mechanisms between obligated entities.
If you’re exploring similar ideas or developing innovative approaches to AML effectiveness enhancement and testing, we’d love to connect and exchange ideas while we are collecting further feedback on such an approach.