Deliverability at Scale — Case Study Portfolio: How Enterprise Brands Improved CLV

TL;DR: Paint after plumbing

These are anonymized, enterprise-scale programs run like operations, not art projects. Every number below is holdout-adjusted—meaning we kept randomized controls alive, even during “big weeks,” so the deltas you see are incremental, not last-click illusions. The thread through all five cases: deliverability first (domain/auth alignment, engagement banding, suppression discipline), then creative (proof-first, plain-English offers), then cadence, then incentives only where they pay. In that order, CLV moves. In any other order, you’re guessing and calling it strategy.

Methodology: what “holdout-adjusted” means here

We don’t grade programs with screenshots. We grade them with counterfactuals. For each intervention—deliverability fix, creative variant, cadence shift, or incentive gate—we withheld a randomized slice of the eligible audience and reported deltas versus that control. When structure changed (e.g., a new second-purchase accelerator), we also reserved a flow-level control (5–10% kept on the old flow) for 2–4 weeks. Deliverability confounders (complaints, seed/panel trendlines) were monitored across arms to make sure “lift” wasn’t just “the treat group went to inbox and control went to Promotions.”

Primary dials

  • RPR (Revenue per Recipient): holdout-adjusted, split flows vs. campaigns, by channel (email/SMS).
  • P2-30: second-purchase rate within 30 days for newly exposed cohorts.
  • CLV horizon: 12 or 24 months modeled off survival curves and order margins.
  • Payback month: first month cumulative gross margin ≥ CAC for a cohort.
  • Discount reliance: % of repeat orders using sitewide codes (trend).

Deliverability guardrails

  • Dedicated sending domain per stream, SPF/DKIM aligned, DMARC policy enforced post-stabilization.
  • Engagement banding (0–30/31–60/61–90), strict sunset, provider-level throttles.
  • Complaint rate by mailbox provider (Gmail/Yahoo/Outlook), seed/panel trendlines (not single snapshots).

First principles: the confounders that fake “lift”

Before the stories, the traps. Most “messaging wins” die on CFO’s desk because they were confounded:

  • Placement drift: treat arm enjoyed inbox; control languished in Promotions. Fix: stratify by engagement band; watch complaints and seeds across arms; pause and re-randomize if drift appears.
  • Identity mess: duplicates and mismatched external IDs cause “new money” to appear where it simply moved. Fix: choose a customer key, normalize email/phone, dedupe before you test.
  • Revenue inflation: gross revenue instead of margin, refunds ignored, or attribution windows bigger than the test. Fix: net revenue and consistent windows aligned to the test period.
  • Discount ghosting: high offer acceptance interpreted as lift, even when sure-things cashed the perk. Fix: uplift tests; pay only where treatment effect > 0 after discount cost.

With those out of the way, you can read what follows as a series of operating stories, not marketing lore.

Case 1 — North America Apparel: placement stabilization → CLV up

Profile: fashion e-commerce with seasonal peaks; AOV ~$68; multi-brand list in one ESP; North America focus.

Baseline (28-day average)

  • Complaint rate: Gmail 0.11% (red), Yahoo 0.06%, Outlook 0.07%.
  • Seed/panel: trending down for marketing stream; lifecycle steady.
  • RPR (email): flows $0.78, campaigns $0.42 (both attributed; no controls yet).
  • P2-30: 17.8% for newly exposed cohorts.
  • CLV_12 (cohort median): $118 margin; payback month 5.
  • Discount reliance: 41% of repeat orders used sitewide codes.

Diagnosis

Over-mailing unengaged audiences during “big weeks,” image-only templates in the flagship campaign, and a mismatched link shortener. Complaint tick-ups lined up with campaign days. The program looked busy; reputation was quietly falling over.

Interventions

  1. Deliverability plumbing: moved campaigns to news.brand.com, lifecycle to updates.brand.com; SPF/DKIM aligned; DMARC at p=none then to quarantine post-stabilization; branded tracking CNAME.
  2. Banding: 0–30/31–60/61–90 with strict sunset; “no exceptions” policy.
  3. Template refactor: accessible HTML; live text for headers; List-Unsubscribe + List-Unsubscribe-Post; proof-first content variant.
  4. Holdouts: 20% message-level for flagship campaign; 10% flow-level for second-purchase accelerator redesign.

Results (day 45 vs baseline; holdout-adjusted)

  • Complaint rate: Gmail 0.04% (green), Yahoo 0.03%, Outlook 0.04%.
  • Seeds: campaign stream trendline recovered to pre-slump levels; lifecycle steady.
  • RPR (email): flows $0.88 (+$0.10); campaigns $0.49 (+$0.07).
  • P2-30: +2.6 pts (to 20.4%).
  • Discount reliance: −9 pts (to 32%).
  • CLV_12: modeled increase +$14–$18 per new cohort; payback pulled left by ~0.6 months.

Why it mattered

Nothing fancy. We took our foot off the reputation hose, replaced “image posters” with emails machines and humans could parse, and stopped talking to people who’d already told us they weren’t listening. Flows paid; campaigns stopped stealing oxygen; CLV rose because trust did.

Case 2 — EU CPG: banding + proof-first → payback left by a month

Profile: pan-EU consumables; AOV ~$42; multilingual (EN/DE/FR/IT/ES); heavy seasonal promotions.

Baseline

  • Deliverability: domains combined; DMARC p=none; transacts and promos shared one stream.
  • RPR (flows): $0.61; RPR (campaigns): $0.37.
  • P2-30: 15.2% (promo cohorts underperformed).
  • CLV_12: $96; payback month 6.
  • Discount reliance: 48% of repeat orders used sitewide codes.

Interventions

  1. Domain split: news.eu.brand.com (marketing), updates.eu.brand.com (lifecycle), notify.eu.brand.com (transactional). DMARC enforced post-stability; BIMI later.
  2. Language packs: single logic with dictionaries; translators edited keys; RTL not required here.
  3. Banding & throttles: per provider caps; send-time distribution (no :00 pileups).
  4. Proof-first variant: A/B (then bandit) proof-first vs offer-first per language; message-level holdouts 15% for campaign and save touches.

Results (8 weeks; holdout-adjusted; median across markets)

  • RPR (flows): $0.70 (+$0.09); (campaigns): $0.44 (+$0.07).
  • P2-30: +3.1 pts (promo cohorts recovered ~+4.0 pts).
  • Discount reliance: −12 pts (to 36%).
  • CLV_12: modeled increase +$19–$24; payback moved from month 6 to month 5.
  • Complaints: below 0.05% across mailbox providers after week 2.

Note on multilingual nuance

Proof-first won in DE/FR; offer-first briefly edged in IT during a limited campaign but failed to sustain RPR without raising reliance. The program kept bandits on so traffic followed winners without staff arguing in three languages about creative philosophy.

Case 3 — APAC Beauty: SMS quiet hours + consent cleanup → revenue per head

Profile: beauty brand operating AU/NZ/SG/HK; email steady; SMS opt-outs rising; complaints creeping after late-night sends.

Baseline

  • SMS opt-out per send: 0.9–1.2% (spiking during promotions).
  • RPR (SMS nudges): $0.04; (email flows): $0.73.
  • P2-30: flat at ~16.0%.
  • Quiet hours: ad-hoc; timezone on profile missing for ~40% of SMS opt-ins.

Interventions

  1. Profile quiet hours: local windows written to profile (e.g., 20:00–08:00); applied in orchestration; exceptions limited to delivery notifications.
  2. Consent cleanup: reconciled SMS opt-ins with evidence; removed ambiguous CSV imports; localized opt-out keywords.
  3. Snooze link: added “Snooze 7 days” to all promotional SMS.
  4. Holdouts: SMS nudge holdout 20% inside email flows; SMS-only campaign holdout 15%.

Results (6 weeks; holdout-adjusted)

  • RPR (SMS nudge): $0.06 (+$0.02) at a cost per message ~$0.012; incremental contribution positive.
  • Opt-out per send: 0.4–0.6% (down ~40–50%).
  • P2-30: +1.9 pts for cohorts receiving the nudge inside flows.
  • Complaints: minor reduction; primary change was audience health and LTV contribution.

What changed

SMS did its job when it behaved like a nudge, not a bullhorn. Consent certainty + quiet hours + a snooze button turned “expensive taps” into predictable incremental margin, and the list stopped bleeding. Email resumed carrying the narrative; SMS proved its seat with math.

Case 4 — Global Subscription Wellness: reason-based saves → churn down

Profile: subscription consumable; upcoming-charge email lacked one-tap control; cancel form collected reasons but flows didn’t branch.

Baseline

  • Cancel rate at touchpoint: 14–16% of at-risk subscribers.
  • Save rate: 22% (unstructured; blanket 10% off occasionally).
  • Churned cohorts CLV_24: $219.

Interventions

  1. One-tap control: skip/swap/pause deep links inside upcoming-charge emails/SMS; “add-on tile” added.
  2. Reason branching: “too much” → cadence/quantity change; “didn’t work” → variant swap + micro-education; “price” → one-time loyalty points boost; no permanent discount.
  3. Holdouts: 15% message-level holdout for upcoming-charge; 10% holdout for cancel intercept branch.

Results (8 weeks; holdout-adjusted)

  • Save rate: 31% (from 22%).
  • Upcoming charge add-on attach: 12% (from 7%).
  • Modeled CLV_24: +$27–$34 per subscriber across treated cohorts.
  • Discount reliance: down; loyalty point cost lower than blanket code leakage.

Why it mattered

We stopped paying sure-things and started giving control to the people who would otherwise leave. The polite version of “no” (skip/swap/pause) made more money than the loud version of “please” (10% off forever). Finance nodded; customers stayed.

Case 5 — Multilingual Media Retail: language packs without chaos

Profile: global brand; 6 languages; duplicated flows per language; legal footers diverged; design debt everywhere.

Baseline

  • Time to ship a change across languages: 3–5 days.
  • Complaint rate spikes followed “catch-up” weeks.
  • RPR inconsistent across languages; experiments impossible to compare.

Interventions

  1. Language-pack template system: one logic per journey; translators edited keys; fallbacks safe; dir="rtl" support added for future locales.
  2. Glossary: product names and banned phrases per language; tone guidance.
  3. QA SLA: encoding, truncation, legal presence, links, accessibility, directionality.
  4. Holdouts/bandits unified: tests ran once; results applied everywhere; bandits allocated traffic by language.

Results (6 weeks; median across languages)

  • Maintenance hours: −40%.
  • Complaint rate: reduced variability; spikes disappeared because “catch-up blasts” ended.
  • RPR (flows): +$0.06 median; P2-30 +1.5 pts.
  • CLV_12: modeled +$8–$12 per cohort; payback improved modestly.

Why it mattered

The victory wasn’t a huge single dial. It was a program you could explain and test. When language stopped being a multiplication problem and became a dictionary, experimentation finally paid.

Meta-analysis: what repeated across brands

Five brands. Different geos, stacks, and risk appetites. The pattern held:

  1. Deliverability first: Every CLV lift began with domain/auth alignment, engagement banding, and message refactors machines could parse. No exceptions.
  2. Flows paid, campaigns stopped stealing oxygen: Splitting RPR (flows vs. campaigns) changed calendars and calmed teams. When campaigns dragged RPR, we fixed or removed them.
  3. Proof-first creative outperformed offer-first in lifecycle: Fewer spikes, better P2-30, lower discount reliance. Offers still mattered—only where uplift proved them worthy.
  4. Quiet hours + consent discipline saved SMS: Lists stopped bleeding; SMS earned its place as a nudge with math, not slogans.
  5. Language packs beat duplicated logic: Duplicates created drift; dictionaries created speed.

The delta wasn’t a trick. It was discipline. Case after case, CLV improved because trust improved—and trust improved because the program respected people and platforms.

Operating system: how to make the wins durable

Publish thresholds and policies

  • Complaint thresholds (yellow/red) per provider; freeze rules.
  • Engagement banding and sunset policy (no exceptions).
  • Quiet hours and send windows by region.
  • Discount gating: uplift or it’s off.

Weekly rituals (10 minutes)

  • Holdout-adjusted RPR split; P2-30 for exposed cohorts.
  • Incremental margin summary (after costs); payback movement.
  • Trust dials: complaints, seeds, unsub/opt-out.
  • Two bullets: what changed; what we’ll test next.

QA and change control

  • Templates: live text, alt text, AAA contrast, one-click unsub headers.
  • Links: branded tracking, short redirect chains, localized routes.
  • Change-freeze during risk windows; rollback owner named.

60-day replication plan

Weeks 1–2: Plumbing

  • Audit domains; align SPF/DKIM; set DMARC to p=none and plan enforcement.
  • Publish banding/sunset policy; implement suppressions.
  • Refactor one high-impact template to accessible HTML; add List-Unsubscribe-Post.
  • Start seed/panel trendlines and weekly deliverability readouts.

Weeks 3–4: First proofs

  • Turn on message-level holdouts (15–20%) for save/recommendation touches.
  • Split RPR (flows vs. campaigns); pause one campaign to measure opportunity cost.
  • Add quiet hours to SMS profile; localize opt-outs; add Snooze 7 days.

Weeks 5–6: Structural lift

  • Rebuild second-purchase accelerator; reserve a 10% flow-level control; report P2-30 delta.
  • Run proof-first variant against offer-first in onboarding; graduate winner with a bandit.

Weeks 7–8: Incentives with math

  • Run an uplift test for discount gating in a clear risk band; turn off where treatment effect ≤ 0.
  • Move DMARC to quarantine if stable; rotate DKIM selectors.

Weeks 9–10: Globalize safely

  • Replace duplicated language flows with a language-pack template on one journey; verify fallbacks and QA across locales.
  • Throttle by provider; avoid top-of-hour pileups; stagger across regions.

Weeks 11–12: Make it boring

  • Freeze SOPs that worked (banding, QA, weekly readout).
  • Rerun an important test to confirm repeatability; schedule quarterly reruns for business-critical proofs.
  • Publish a two-page “How we prove lift” note internally so results survive turnover.

FAQ

Why tie case studies to deliverability?

Because placement confounds everything. You can’t measure creative or incentives honestly while the mailbox providers are still deciding whether you deserve to be seen. Deliverability first is not dogma—it’s math.

How big should holdouts be?

For high-traffic touches, 10–20% is healthy. For structural changes, keep a 5–10% flow-level control for 2–4 weeks. If you can’t afford the truth, you can’t afford the program.

Do seeds really matter?

Trendlines matter; single snapshots don’t. Seeds/panels help you see drift early, but holdouts decide budgets. Use both, with discipline.

How do we model CLV shift credibly?

Use historical cohort margins and survival curves; update monthly. Report the movement on payback and a sensible horizon (12 or 24 months). Keep assumptions modest; calibrate against actuals. If the model drifts and you can’t explain why, kill it and restart humbler.

What’s the fastest lever for CLV we’ve actually seen work?

A boring one: split RPR (flows vs. campaigns), fix placement, then rebuild post-purchase and second-purchase with proof-first creative. If you want a “wow,” gate discounts with uplift. But the reliable answer is: plumbing, then paint.

Back to blog