Writing

Where Stripe dispute evidence moves the needle

Giuseppe Giona·11 May 2026

Summary

• 3DS liability shift only covers disputes filed with reason “fraudulent.” For service-business disputes filed as “product_unacceptable,” “subscription_canceled,” or “general,” 3DS contributes nothing.
• The evidence categories that carry outsized weight are: consent records under EU CRD Art 16(m), service-completion evidence with timestamps, and the count of distinct customer engagements.
• A weighted-evidence scoreboard tells you, before replying, whether the dispute is winnable on what you currently have. If it isn’t, the move is to refund and fix the intake, not to write a longer narrative.
• Volume management (staying below the current network monitoring thresholds — Visa’s VAMP and Mastercard’s CMP) matters more than per-dispute win rate. The specific numbers move; the monitoring pattern doesn’t.
• Webhook handlers for dispute events have to be idempotent. Receiver writes to a queue; a cron-driven processor reconciles. Inline replies in the receiver lose to retry storms.

The 3DS misconception

3D Secure 2.x is sold, in the marketing materials, as broad dispute protection. It is not. EMV 3DS’s liability shift only applies to disputes filed under fraud reason codes — in Stripe’s schema, the dispute reason “fraudulent.” Every other dispute reason — “product_not_received,” “product_unacceptable,” “subscription_canceled,” “credit_not_processed,” “duplicate,” “general” — sees no liability shift from 3DS. A customer who completed a 3DS challenge can still chargeback for “service not as described” and the issuer has no shift to apply.

Service-business disputes are mostly not filed as fraud. They are filed as product or service complaints. The proportion you observe depends on your category, but treating 3DS as your primary defense for non-fraud disputes is building defense for disputes you are not getting.

Figure 1 — 3DS liability shift coverage by Stripe dispute reason1 / 7

EMV 3DS’s liability shift is defined against fraud reason codes only — in Stripe’s reason taxonomy, that’s the single value “fraudulent”. The other six reason codes a service merchant typically sees in production receive nothing from completing a 3DS challenge. Architecting dispute defence around 3DS alone — rather than around the consent / completion / engagement evidence Stripe’s submission schema is built to weight — is preparing for a single column of this chart and leaving the other six exposed. Source taxonomy: docs.stripe.com/disputes/categories.

What carries weight

Stripe’s dispute evidence submission schema is a long list of optional fields. The fields are not equal. Three groups carry disproportionate weight in network outcomes.

Consent records. What did the customer specifically agree to, and when, and through what interface. Under EU CRD 2011/83/EU Article 16(m), services that begin before the 14-day withdrawal window expires require the customer to expressly request immediate performance andwaive their right of withdrawal, both on a durable medium. Without an archived record of that consent — ideally text the customer typed or a specific checkbox with the policy text snapshotted at that moment — right-of-withdrawal defences from cardholders run almost uncontested. The same logic applies under the UK’s Consumer Contracts Regulations 2013.

Service-completion evidence with timestamps.The dispute schema has fields for “service date,” “customer purchase IP,” “access activity log,” and “billing address.” The ones the networks weight are completion and delivery: a portal access log showing the customer logged in after the service date, files downloaded with timestamps, an explicit receipt confirmation from the customer’s own account. Pre-purchase evidence (the customer’s IP at checkout) is weak by comparison.

Customer engagement count.The number of distinct, documented communications with the customer, across distinct channels, predicts outcomes better than the prose of any single email. Three touchpoints in different channels (email, support ticket, dashboard message) outweighs ten exchanges in one thread. The reading from the issuer side is “this was a real, ongoing relationship,” not “a transaction we can’t verify.”

A weighted-evidence model

A useful internal mental model: every dispute starts at zero. Each present piece of evidence adds weighted points; each present negative signal subtracts. Cap the maximum well below 100 — no dispute is certain to win, and a model that predicts “100” is hiding its tails. Weights come from two sources: Stripe’s own published guidance on what’s persuasive in each category, and your own outcome history if you have enough disputes to draw a curve.

What the model is for: it tells you, before you reply, whether the dispute is winnable on the evidence you currently have. If the score is below your historical loss threshold, the rational move is to refund (and update intake so the next customer doesn’t lose the same evidence in the same place), not to write a longer narrative. Better narratives move outcomes at the margin. The structural evidence determines almost everything else.

Negative signals matter more than people assume. An issued refund weighs strongly against you — the network reads it as the merchant agreeing there’s an issue. A typed signature whose name doesn’t match the account name is read as evidence of a credential or impersonation problem. A prior dispute count above one signals a chargeback pattern from that cardholder — it weakens, not strengthens, the case.

Volume is the actual game

Even a 100% per-dispute win rate doesn’t save you if your dispute rate crosses the network monitoring thresholds. Both networks publish merchant-level programs and revise them on a multi-year cycle, so it’s worth naming the current shape rather than memorising a specific percentage:

Visa Acquirer Monitoring Program (VAMP).Consolidated from the legacy VDMP and other programs on 31 March 2025. The merchant-level “excessive” threshold for North America, EU, and Asia-Pacific moved to 1.5% on 1 April 2026 (down from 2.2%). Acquirer-side thresholds are tighter.
Mastercard Chargeback Monitoring Programs. The Excessive Chargeback Merchant tier sits at 1.5% with a 100-chargebacks-per-month minimum, exempting very low-volume merchants from the program even at higher percentages.
Hit either and you enter a remediation program. Held deposits, fines per dispute, and, eventually, loss of card processing entirely.

The defense that matters most is the one that keeps you off the program list. The pattern is a rolling-window monitor on your own dispute rate — typically 30-day and 90-day windows — with internal alert thresholds set well below whichever percentage the network currently applies. A reasonable internal “watch” level sits at roughly half the network’s warning level, so you have weeks to act, not days. Plus refunding Early Fraud Warnings proactively. An EFW that becomes a dispute counts against your rate; a refunded EFW does not. Stripe surfaces EFWs through the Radar pipeline; refunding within the window is a one-API-call operation.

Webhook handlers, idempotently

Stripe’s webhook delivery is at-least-once. Replies to charge.dispute.created, charge.dispute.funds_withdrawn, and charge.dispute.closed will arrive more than once during retry storms or after extended outages. Doing the dispute reply inline in the webhook receiver fails in two ways: a 5xx anywhere in the chain leaves you with Stripe retrying against a partial state, and a transient database failure mid-handler can double-count or duplicate-write.

The pattern that survives this is the boring one. The receiver verifies the Stripe signature, deduplicates by event ID against a small idempotency table, writes a normalised record to a queue table, returns 200. That’s it. A separate processor — a cron job or a worker — reads the queue, reconciles against Stripe’s Disputes API (the API is the source of truth, not the webhook payload), takes the action, marks the queue row done. Idempotent at the event level, retry-safe at the action level. The same shape covers every adjacent event: EFW issued, payout reconciliation, payment intent succeeded, dispute closed. Five layers of resilience is just this pattern, applied at each retry boundary.

What evidence can’t fix

Some losses are structural. If a refund has been issued for the transaction, the dispute is almost always ceded — the network reads “refunded” as “merchant agrees there’s an issue,” regardless of the narrative attached. If the typed-signature field captured a name that doesn’t match the account name on file, the inconsistency reads as an unresolved identity problem and the issuer weighs against you. If your refund policy is real but buried three modals deep, “refund policy disclosed” is a low-confidence claim and the bank discounts it heavily.

None of those are evidence problems. They are intake, product, and policy problems that present as evidence problems on the dispute reply. Fixing them shifts the dispute rate before any individual reply gets written.

What this post is not

Not a guarantee about win rates. The networks weight evidence non-deterministically across categories, issuers, and regulatory regions. The model above describes structure: what kinds of evidence the schema is built around, and how network monitoring works in shape. Concrete weights, current thresholds, and outcome distributions are work each merchant has to measure and check against the network’s current published program. Threshold numbers in this post were accurate as of May 2026; both networks revise them on multi-year cycles.

Not advice on a specific dispute. The right move on any one dispute depends on the reason code, the cardholder’s region, your refund history, and your current rate. The framework is for shaping the system; individual disputes still need their own reading.

Primary sources. Stripe disputes documentation: docs.stripe.com/disputes. Dispute reasons and evidence categories: docs.stripe.com/disputes/categories. Webhook signing and idempotency guidance: docs.stripe.com/webhooks. EU Consumer Rights Directive 2011/83/EU Article 16: eur-lex.europa.eu. UK Consumer Contracts (Information, Cancellation and Additional Charges) Regulations 2013: legislation.gov.uk. Visa Acquirer Monitoring Program (VAMP) is documented in Visa’s Integrity Risk Program operating guidance; it consolidated and replaced the legacy Visa Dispute Monitoring Program on 31 March 2025. Mastercard’s Chargeback Monitoring Programs are documented in the Mastercard Chargeback Guide (available to merchants through their acquirer).