AI in Banking and Finance -- AI at Scale

Banks were doing AI before "AI" was a word anyone outside a research lab used. The fraud detection system that watched your credit-card transactions in 2005 was a neural network trained on millions of historical transactions. The credit-card application you filed in 2008 was scored by a model that had been continuously updated on every previous applicant's repayment behaviour. Algorithmic trading replaced floor traders on most major exchanges by the early 2010s. Banking is the industry where machine learning grew up, and most of what we now talk about as "AI deployment risk" — opaque models, biased decisions, the impossibility of appeal — banks have been wrestling with for twenty years.

Fraud detection — the original ML use case

Every transaction you make on a credit card or bank card runs through a fraud-detection model in real time. The model has milliseconds to decide. It looks at the merchant, the amount, the time of day, the location, your recent spending pattern, the device used, the merchant's risk profile, the network's risk profile, and dozens of other features. It produces a score. If the score is too high, the transaction is declined or you get a verification SMS.

The major card networks — Visa, Mastercard, American Express — operate at the heart of this. Visa's Advanced Authorization runs around 76,000 transactions per second through its risk-scoring system. The systems are now mostly deep neural networks, retrained constantly on the previous day's data. The reason fraud rates have stayed roughly flat as e-commerce has grown by orders of magnitude is that these models have kept up.

The trade-off is well known to anyone who has ever had a card declined while travelling. False positives are constant. The models tend to be conservative in unfamiliar contexts, and the cost of a missed fraud is asymmetric — a wrongly-declined customer is annoyed, a missed fraud is a real loss. Tuning that asymmetry is most of the actual work in running these systems.

Credit decisioning — the place fairness questions started

If you apply for a credit card, a personal loan or a mortgage, an algorithm now scores your application. In Australia the credit bureaus (Equifax, Experian, illion) provide a comprehensive credit report and a credit score. Banks supplement that with their own internal models trained on their own customers. The mortgage decision is a function of your income, debts, recent credit behaviour, account-balance volatility, employment continuity, and increasingly, transactional features extracted from your bank statements via open-banking APIs.

This is where the fairness debate first hit banking, decades before AI ethics was a research field. The problem is the same one that comes up everywhere: if you train a model to predict who will repay a loan using historical data from a society where some groups had less access to credit, you bake that history in. Even if you remove the protected variable (race, gender, postcode) from the features, the model can reconstruct it from correlated features. A 2018 study by US researchers found that algorithmic mortgage decisions were 40% less likely to discriminate against minority applicants than human loan officers — but the bias was not zero, and the algorithms were charging minority borrowers higher interest rates within the approved population.

The regulatory response in most jurisdictions has been to require lenders to be able to explain decisions, demonstrate that their models do not produce disparate impact, and let applicants challenge adverse decisions. In the US the Equal Credit Opportunity Act and the Fair Credit Reporting Act provide the framework. In Australia the National Consumer Credit Protection Act requires "responsible lending" and the Privacy Act gives consumers a right to challenge incorrect credit reports. None of this fully resolves the fairness question, and probably nothing can.

Anti-money-laundering — the other side of fraud

AML systems watch the same transaction stream that fraud-detection systems do, but for a different pattern: the movement of money in ways consistent with criminal proceeds, terrorism financing, sanctions evasion or tax fraud. Banks are required by law to file Suspicious Matter Reports (SMRs) on transactions that meet various thresholds.

The volume of these reports is enormous. AUSTRAC, Australia's financial intelligence agency, receives around 300,000 SMRs per year, plus hundreds of millions of routine threshold transaction reports and international funds-transfer instructions. The volume is impossible to review manually, so AUSTRAC and the banks both run ML models to triage. The models look for patterns: rapid round-tripping between accounts, structuring (deliberately keeping transactions just under the reporting threshold), unusual jurisdictional flows, accounts that receive payments from many sources.

These models are imperfect in both directions. CommBank's $700 million AUSTRAC settlement in 2018 (still the largest civil penalty in Australian corporate history at the time) was, at heart, a story of an AML system that did not catch what it should have caught — the bank's intelligent deposit machines were used by criminal syndicates to deposit cash in structured patterns over several years before the bank's monitoring systems flagged them. Westpac's $1.3 billion settlement in 2020 was similar in kind: routine international transfer reporting that a model should have caught did not get reported.

Algorithmic trading — the part you do not see

Most equities trading on major exchanges is now algorithmic. High-frequency trading firms like Citadel Securities, Virtu and Jane Street provide much of the market liquidity by quoting prices microseconds faster than human traders can. The actual decisions about what to buy and sell at what price are made by models that have been refined for decades. This is one of the few areas where AI methods have unambiguously made markets more efficient — bid-ask spreads on liquid stocks are dramatically tighter than they were when human market makers dominated.

The risk is the same as it always was: the market dynamics under stress are different from the dynamics under normal conditions, and ML models trained on normal conditions can behave badly in crashes. The 2010 Flash Crash, where the Dow Jones lost 9% in 36 minutes before recovering, was substantially driven by interaction between trading algorithms. The May 2022 cryptocurrency cascade was similar in kind. There is no reason to think the next major market disruption will not have algorithmic feedback loops in it somewhere.

Customer service — the part you do see

The chatbot on your bank's website is now an LLM in most cases, where five years ago it was a decision tree. ANZ, NAB, Westpac and CBA have all rolled out generative AI customer-service tools through 2024-2025. CBA's "Hey CommBank" replaced about 2,000 customer-service queries a day at launch. The economics are clear: a substantial fraction of customer queries are repetitive, the models are competent at the routine ones, and the marginal cost of running an LLM is far lower than a person.

The trade-offs are also clear. The models occasionally hallucinate. They have no memory across sessions unless deliberately built to. They cannot do the genuinely novel or sensitive cases. The good ones know when to escalate to a human; the bad ones do not. Most banks have settled into a hybrid model — first contact is the bot, escalation paths are human, and the sensitive matters (financial difficulty, hardship, complaints) are routed straight to people.

What the banks have learned

Banking is the closest thing to a working example of how to deploy AI at scale responsibly. Not because banks are virtuous — they are constrained by regulation, capital adequacy requirements, and the fact that bad models cost real money. The lessons that have settled out of twenty-plus years of doing this:

Models drift. A model that worked last year does not necessarily work this year. Continuous monitoring of input distributions, output distributions and downstream metrics is mandatory.

Adversarial behaviour is constant. Fraudsters, money launderers and tax evaders adapt to whatever the bank's models flag. The models have to adapt back. The arms race never ends.

Explainability is a regulatory requirement, not a nice-to-have. Decisions that affect people have to be explainable. Models that cannot be explained get replaced, retrained or wrapped in a model that can.

Backtest, then test again before deployment, then monitor in production. The number of times a model has looked great in testing and then failed in production is a recurring lesson. The current generation of MLOps tooling exists because banks built it first to manage their own model risk.

If there is a single takeaway from the banking sector's experience, it is that the work of deploying machine learning responsibly is mostly process and infrastructure, not models. The models are the easy part. Watching them, governing them, retraining them, validating them, and being able to roll back when they go wrong — that is where the actual cost lies. Most of the AI deployment failures elsewhere on this site come from organisations that did not do this work.

Healthcare and Medicine Retail and E-commerce