Migration Playbook: Enterprise ESP/SMS Replatforms with Zero Downtime
Share
Why replatform—and why “lift & shift” breaks
You replatform for one of four reasons: (1) data gravity changed and the old ESP/SMS is fighting your warehouse; (2) channels expanded (in-app/push) and orchestration needs grew up; (3) compliance & security standards tightened (SSO/MFA, DPAs, 10DLC), or (4) experimentation velocity hit a ceiling. All good reasons, none of which justify breaking deliverability or consent because you were in a hurry.
The lie of “lift & shift” is that messaging is files and settings. It’s not. It’s identity, consent, placement, and timing—living things that won’t survive a copy-paste. Treat replatforming like a change-control exercise, not a design sprint. Build plumbing first; paint after.
The risk map: where migrations actually fail
- Deliverability drift: new domain without warm-up; DMARC misalignment; image-only templates; mailing unengaged during “big weeks.”
- Identity fracture: external_id lost; email/phone merged incorrectly; anonymous sessions not stitched; duplicate profiles ballooning suppressions.
- Consent regressions: CSV imports with mystery opt-in states; mixed jurisdictions; SMS 10DLC campaigns mismatched to actual traffic; missing HELP/STOP/quiet hours.
- Event entropy: “Added to Cart” means five things; timestamps drift; attributions garbled; flows firing twice.
- Ops chaos: no send-freeze; no rollback; last-minute approvals; no owner to push the big red button.
This playbook is designed to neutralize each category before it gets expensive.
Week 0 audit: inventory, identity, consent, and data
Start with a humble spreadsheet. Your future self will call to thank you.
Inventory (flows, campaigns, assets)
- Flows: name, purpose, triggers, segments, languages, dynamic content, owner, KPI.
- Campaign cadences: volume by cohort, suppression rules, add-ons (AMP, animations).
- Assets: template partials, language packs, UGC blocks, product recommendation logic.
Identity contract
- Primary key per channel (email, phone, app id), external_id (CRM/warehouse).
- Anonymous → known stitching: how, where, when; cookie/session id mapping.
- De-duplication rules (email normalization, phone E.164, domain aliases).
Consent contract
- Email: opt-in source, timestamp, jurisdiction; double opt-in where required.
- SMS: brand & campaign registrations (10DLC), HELP/STOP keywords, quiet hours on profile.
- Tools: Dataships audit or OneTrust/Ketch/Transcend preference center + DSAR pipeline.
Data audit
- Events (names, schemas, timestamps, ids): Viewed Product, Add to Cart, Checkout Started, Placed Order, Refund Issued, Subscription events.
- Properties: ZPD (primary_goal, variant_pref), loyalty (points_to_next_reward, tier), geo/timezone.
- Retention & deletion: what do you keep, for how long, and how do you prove deletion?
Stack patterns: Klaviyo, Braze, Attentive, Postscript, Cordial
Tools are opinions about how work should happen. Respect their biases.
Klaviyo
- Strengths: Shopify gravity, fast lifecycle, email+SMS in one brain, dynamic blocks, straightforward A/Bs.
- Watch-outs: multilingual requires language packs/partials; advanced experimentation via hacks or bandits in warehouse.
- Migration notes: replicate flows with improvements, not just copies; map event names precisely; keep persistent holdouts.
Braze
- Strengths: multi-channel (email/SMS/push/in-app), robust experimentation, profile APIs.
- Watch-outs: needs warehouse/CDP discipline; QA pipeline heavier.
- Migration notes: enforce external_id early; align catalog feeds; port segmentation logic to Braze audience builder.
Attentive / Postscript (SMS)
- Attentive: growth tooling, compliance guardrails, journey builder, analytics. Enterprise-friendly.
- Postscript: lean, deep Shopify hooks, fast to iterate.
- Migration notes: 10DLC brand/campaigns first; map opt-ins/opt-outs; quiet hours on profile; test HELP/STOP.
Cordial
- Strengths: API-forward, flexible data model, scale.
- Watch-outs: bring your own experimentation rigor; explicit data contracts needed.
Warehouse & Reverse ETL: Snowflake/BigQuery + dbt + Hightouch/Census is the universal adapter. Keep your identity and risk in your house; let orchestration tools do the last mile.
Deliverability prerequisites: domains, DMARC, warm-up, seeds
Placement is a license, not a feeling. Treat it like change-controlled infrastructure.
-
Dedicated sender domain: e.g.,
news.yourbrand.com. Configure SPF/DKIM, align DMARC. Start onp=none, move toquarantinethenrejectafter stabilization. - Tracking CNAMEs: branded click/open domains to avoid mismatched-domain flags.
- Engagement band policy: 0–30 / 31–60 / 61–90 days; sunset after two re-engagement touches.
- Warm-up plan: 2–3 weeks of banded sends; no blasts to unengaged; proof-first lifecycle before promos.
- Seed/panel baselines: measure placement before changes so you know what “good” was.
Event & identity mapping (with examples)
Write a dictionary: left column = old platform event/property, right column = new platform schema, with id and timestamp rules.
| Legacy | Target | Notes |
|---|---|---|
| Checkout Started | checkout_started | Ensure cart_id present; dedupe rapid retries by cart_id+timestamp |
| Placed Order | order_completed | Include order_id, order_value, currency, items[] (sku, qty, price) |
| Refund Issued | refund_issued | Emit negative revenue if your attribution expects net sales |
| Subscription Paused | subscription_paused | Key to reason-based saves; store reason enum |
Identity merge rules (pseudo)
-- Normalize email
email_norm = LOWER(TRIM(email))
email_norm = REGEXP_REPLACE(email_norm, '\\.(?=[^@]*@)', '') -- optional dot removal for gmail
-- Normalize phone
phone_e164 = TO_E164(phone_raw, country_hint)
-- Merge policy
customer_key = COALESCE(external_id, email_norm, phone_e164)
Consent & 10DLC: make legal and carriers happy
Your consent logic should stand on its own absent the ESP/SMS tool. The platform executes; your warehouse proves.
-
Source of truth: centralized consent table with
channel,scope,jurisdiction,timestamp,source,ip. - Jurisdiction logic: Dataships audit to identify gaps (e.g., missing double opt-in), or OneTrust/Ketch/Transcend preference center.
- 10DLC: register brand & campaigns; sample content must match traffic; update when journeys change; quiet hours enforced per profile; HELP/STOP wired.
- Footer & headers: List-Unsubscribe & List-Unsubscribe-Post (RFC 2369/8058) for one-click unsub.
Flow migration strategy: clone, improve, or redesign?
Everything in a replatform begs you to copy. Resist. Use migration to fix what your future team will hate.
- Clone when the logic is good and dependencies are stable (e.g., transactional notices).
- Improve when you can add a proof-first block, progress header (“You’re {{points_to_next_reward}} from $10 off”), or better segmentation.
- Redesign when the model wants it (Braze Canvas, Cordial data flows) or when multilingual needs require language packs instead of duplicated flows.
Start with the spine: Post-Purchase → Second-Purchase Accelerator → Replenishment → Winback → Subscription Saves. Those five pay rent.
Parallel sends & placement validation
There is no zero-downtime without parallelism. For two weeks, run critical flows in both platforms, mailing bands of engaged users. Measure, don’t hope.
- Seed/panel: daily snapshots; compare inbox rates; don’t chase single-day noise.
- Complaint dashboards: track per-domain; pause promos if Gmail moves.
- Holdouts: keep message-level controls for pilot flows; report holdout-adjusted RPR and conversion.
- Diff logs: when platforms disagree (segment size, dynamic content), log the diff and fix before cutover.
Cutover day playbook & rollback
Go/no-go criteria
- New domain warm-up stable; seed/panel equal or better than baseline.
- Complaint rates ≤0.08% at Gmail; stable elsewhere.
- Critical flows validated with test profiles; no duplicate sends; timestamps correct.
- Consent & identity tables reconcile (spot-checks across cohorts).
- Rollback path signed (who presses it; which flows revert; DNS/ESP toggles).
Day-of checklist
- Communication sent: “change freeze” notice; on-call matrix; escalation channel.
- Disable legacy triggers incrementally; enable new triggers; monitor diffs.
- Seed/panel check at T+1h, T+4h, T+24h; complaint watch continuous.
- Post a 24-hour cutover report: flows enabled, issues, mitigations, next checks.
Rollback
Rollback is not failure; it’s discipline. If complaints spike or seeds tank, revert flows, pause promos, fix, and try again. Document why.
Stabilization weeks: what to watch first
- Deliverability: complaint by domain, read-time proxies, seed/panel trendlines.
- Revenue: holdout-adjusted RPR (flows vs. campaigns), AOV and conversion where applicable.
- Retention: 30-day second-purchase rate for cohorts first exposed in new platform.
- SMS: opt-out rate per send; TCR flag status; quiet-hour compliance rate.
- Ops: incident log, QA misses, approval times; retro after 2 weeks.
Multilingual & regional cutovers without chaos
Duplicating flows per language is how programs drown. Use language packs and partials—translators edit keys, not logic. For RTL (Arabic/Hebrew), flip containers (dir="rtl"), mirror icons, verify fonts, and run extra device QA. Consent text must be localized; quiet hours must respect local timezones. Roll regions in waves: one market per week, not five in a day.
RACI, SLAs, and change control that prevent fire drills
| Task | R | A | C | I |
|---|---|---|---|---|
| Domain & DMARC setup | Deliverability lead | Head of Lifecycle | IT/Sec | Marketing |
| Event & identity mapping | Data engineer | Head of Data | ESP/SMS admins | Lifecycle |
| Consent reconciliation | Privacy lead | DPO/Legal | Lifecycle | IT/Sec |
| Parallel send validation | Producer | Head of Lifecycle | Deliverability | Stakeholders |
| Cutover & rollback | Producer | Head of Lifecycle | Deliverability/Data | Marketing/Legal |
Change-freeze clause (SOW excerpt)
During risk windows (warm-up, parallel sends, cutover, incidents), a change freeze is in effect. Only pre-approved critical fixes
may deploy. All other changes are queued. Exceptions require joint approval by Client Lifecycle Lead and Agency Producer.
QA & Send SLA (excerpt)
Agency provides [X] business hours for QA before any scheduled send: device rendering, link/UTM validation, segmentation checks,
accessibility (alt text, contrast). No template changes within [Y] minutes of send. Client consolidates feedback through one approver.
45/60-day timeline (day-by-day milestones)
Days 1–7: Audit & prerequisites
- Inventory flows/campaigns; identity/consent contracts documented.
- Dedicated sender domain provisioned; SPF/DKIM/DMARC configured (
p=none). - Engagement band policy and sunset enforcement turned on in legacy platform.
- Seed/panel baseline runs; complaint dashboards by domain configured.
- 10DLC brand/campaign registrations confirmed; HELP/STOP tested.
Days 8–21: Warm-up & mapping
- Warm-up: engaged cohorts only; lifecycle > promos.
- Event/identity mapping: schemas in dbt; sample payloads end-to-end; external_id stitched.
- Consent reconciliation: imports with source/timestamp/jurisdiction; fix mixed states.
- Template refactor: real text, language packs, accessibility passes.
Days 22–35: Parallel sends & fixes
- 2–3 critical flows in parallel; segment-size diffs investigated.
- Seeds: daily; complaints: continuous; anomalies triaged and fixed.
- Holdout-adjusted RPR collected for pilot flows; document “what changed.”
Days 36–42: Cutover
- Go/no-go review; cutover plan with timestamps; on-call schedule.
- Disable legacy triggers; enable new; monitor T+1h, T+4h, T+24h.
- Rollback if thresholds breached (complaints/placement); retro if executed.
Days 43–60: Stabilization
- Weekly readout: RPR (flows/campaigns), complaints, seeds, P2-rate, SMS opt-outs.
- Backlog: lift winners into more flows; retire parity builds that aren’t needed.
- Move DMARC to
quarantine→rejectwhen confident.
Case snapshots: three migrations, three lessons
CPG (Klaviyo → Braze; Attentive retained)
Problem: app messaging needed; data lived in warehouse; legacy flows duplicated per language. Fix: external_id stitch + language packs; warehouse → Braze via Hightouch; parallel for 18 days. Outcome (day 45 vs baseline): complaint steady, inbox unchanged; +7% holdout-adjusted RPR on post-purchase; -22% build hours monthly due to partials.
Apparel (Cordial → Klaviyo; Postscript → Attentive)
Problem: slow experiments; SMS complaints; consent states inconsistent. Fix: Dataships audit; 10DLC re-registration; warm-up new domain; proof-first templates. Outcome: -34% SMS opt-outs, +9% RPR in second-purchase, complaint ≤0.05% Gmail.
Wellness (Klaviyo → Klaviyo consolidation across regions)
Problem: six accounts, duplicated flows, no RTL support. Fix: consolidate to one global pattern; lang packs; dir="rtl" for MEA; staged regional cutovers. Outcome: -40% maintenance hours, +6 pts placement in problematic region, no downtime.
FAQ
Can we migrate in two weeks?
Not safely. You can start in two weeks. Safe cutovers need warm-up, parallel, and validation. If someone promises “done in a week,” they’re selling open rates, not placement.
Do we need a CDP to replatform?
No. A warehouse + dbt + reverse ETL is enough for most enterprises. Add a CDP when you have app events and complex audience choreography.
How much of our calendar should we freeze?
During warm-up and cutover, freeze blasts to unengaged and limit promos to engaged cohorts. Flows can run; focus on proof-first content.
What’s the fastest way to tank placement?
Mail unengaged on a brand-new domain, ignore DMARC, send image-only templates, and “blast because it’s a big week.” Or you could not.
How do we prove we didn’t lose money?
Holdout-adjusted RPR (flows vs. campaigns), complaint by domain, seed/panel trendlines, and P2-rate for cohorts first exposed in the new platform. One weekly slide. No screenshots of pretty carousels.