When people in our industry say "AI underwriting," they usually mean one of about six different things, and conflating them is how funders end up with stacks that don't work or vendors that don't deliver. This article is an attempt to walk through what we actually run, what is hype, and where the real performance gains come from.

What the manual process actually costs

Before getting into the AI piece, let's be honest about the baseline. A purely manual MCA underwrite (a human reading three or four months of PDFs, calling references, eyeballing for NSFs, and pricing the file) takes a competent underwriter somewhere between 45 minutes and two hours per file. At a fully loaded cost of $40–$70/hour, that's $30–$140 of pure underwriting labor per file. Worse, the decision quality is highly underwriter-dependent: two of your best people will rate the same file differently, and your default rate by underwriter usually has a wider spread than anyone is comfortable admitting.

The stack we actually run

Here is what a competitive 2026 underwriting pipeline looks like, layer by layer:

Layer 1: Verified, structured data retrieval

The single biggest improvement is replacing PDF bank statements with direct bank-data retrieval via Plaid, MX, or Finicity. You get structured transactions, accurate balance history, and a verified account holder. This alone eliminates the most common source of fraud (doctored statements) and 80% of the manual data entry. For files where the merchant won't connect their bank, an OCR + transaction-classification model on uploaded PDFs is the fallback, and it has gotten startlingly good.

Layer 2: Transaction classification

This is the most under-appreciated piece. Raw bank transactions are a mess: "DEPOSIT" could be true revenue, an internal transfer between the operator's accounts, an SBA loan disbursement, a customer refund reversal, or a personal Venmo. A transaction-classification model labels each line so the downstream cash-flow calculation is actually meaningful: true revenue, transfers, refunds, MCA payments, payroll, taxes, owner draws, returns. Done well, this turns 90 days of bank data into a reliable income statement.

Layer 3: Hard policy rules

Before any ML score runs, a policy engine should knock out anything that violates underwriting policy: minimum time in business, minimum monthly revenue, restricted industries, prior bankruptcy windows, current open positions that exceed your stacking tolerance. These are deterministic and explainable, which matters for fair-lending review and for explaining declines to brokers.

Layer 4: ML scorecard

The actual risk score. In practice, gradient-boosted models (XGBoost, LightGBM) and well-tuned logistic regressions dominate. Neural networks are not where the value is for tabular data of this kind. The most predictive features are almost always: average daily balance trend over time, NSF days per month, end-of-day low balance, deposit count and consistency, revenue concentration, and existing-debt service.

Layer 5: Human review on the edge

The model approves the clear approves, declines the clear declines, and routes the middle band, typically 15–30% of volume, to a human underwriter with a structured review template. The human's job is no longer arithmetic; it is judgment about things the model can't see: a recent change of ownership, an industry-specific seasonality the model is underweighting, a relationship-priced renewal.

Underwriting team reviewing flagged files together in a modern office
ML routes the clear cases; human underwriters focus on the middle band where judgment actually adds value.

What gains to actually expect

I want to be careful here, because the marketing claims in this space are out of control. After deploying an integrated pipeline like the one above (versus a previously manual baseline), realistic, repeatable gains are roughly:

  • 60–85% reduction in time-to-decision for clean files (minutes vs. hours).
  • 30–50% reduction in underwriting labor cost per funded deal.
  • 10–30% improvement in defaults at constant approval rate, or the ability to hold defaults flat while widening approval.
  • 2–4x increase in throughput per underwriter, mostly by removing repetitive work.
  • Meaningful reductions in fraud losses, primarily from verified bank data replacing PDF statements.

The numbers you'll see in pitch decks (70% default reduction, 10x throughput) usually come from comparing the new system to an extremely undisciplined old system. Be skeptical.

Fair lending has to be designed in, not bolted on

Equal Credit Opportunity Act / Regulation B applies to commercial credit, not just consumer credit, and Section 1071 will add formal data-collection obligations on top. The practical requirements for your ML pipeline are: documented feature lists with rationale, monitoring of approval and pricing disparities by protected class (where data is collected appropriately), model governance with version control, and an adverse-action notice process that gives a real reason, not just "model declined." For the full regulatory picture see our compliance guide.

Where AI is over-sold

A few claims to push back on next time you hear them in a vendor pitch:

  • "Our LLM reads bank statements end-to-end." LLMs are useful for free-text reasoning, but for high-volume, regulated bank-data classification, you want a small, fast, monitored classifier, not a $0.02-per-call general model that will hallucinate on edge cases.
  • "AI eliminates the need for underwriters." No serious lender we know runs zero-human underwriting in production for non-trivial deal sizes. The economics break long before regulators do.
  • "Our model is 95% accurate." Accuracy is the wrong metric on imbalanced default data. Ask for AUC, KS, and gains-chart-by-decile. Then ask to see vintage performance.

If you are just starting

If you are running mostly-manual underwriting today, you do not need to deploy a full ML stack on day one. The highest-ROI sequence is usually: (1) plug in verified bank-data retrieval to eliminate doctored statements and manual entry; (2) add a transaction classifier to get clean cash-flow numbers; (3) write tight, deterministic policy rules (most of the discipline win is here); (4) layer a scorecard on top once you have clean enough data to train one. Steps 1–3 alone typically deliver most of the speed gain.

We've helped funders walk through this exact sequence as part of our underwriting platform. If you want to compare notes on what works, get in touch.

Frequently asked questions

Do I need to be a data-science shop to deploy ML underwriting?

No. The infrastructure tier of this market (bank-data aggregators, transaction-classification APIs, and turnkey scorecard vendors) has matured to the point that a small ops team can wire together a credible pipeline without hiring data scientists. You will still want a model-governance partner or consultant for the scorecard layer.

How do I know if my current model is any good?

Look at vintage performance by approval-score decile over at least 12 months. The top decile should be materially better than the middle, and the bottom should be the riskiest. If the curve is flat, your model is not adding information; it's just adding latency.

What is the right way to handle adverse-action notices when a model declines?

Map the model's top-contributing features for the individual decision (using SHAP values or a similar attribution method) to a short list of plain-English reasons defined in advance, and document the mapping. Avoid generic "failed our credit policy" language; it will not survive a fair-lending review.

Should I build or buy the underwriting platform?

Most funders should buy the platform and own the policy. Building the workflow, data integrations, and audit infrastructure is a multi-million-dollar undertaking with no competitive advantage. Your underwriting policy, your data, and your post-funding behavior models are where you actually differentiate.

Sources & further reading