The Vault Pattern: Keeping PII Out of Your Apps

Every database that stores PII is a future incident report waiting to happen. The Vault pattern flips the usual question. Instead of asking "how do I secure this database that holds my users' SSNs?" it asks "why does this database have SSNs in it at all?"

The core idea

Your application services never store raw PII. They store tokens — opaque references that mean nothing on their own. The raw PII lives in exactly one place: a hardened vault service. Apps call out to the vault to tokenize on the way in, and detokenize when they actually need the value (rare, and ideally only at the edge — sending a confirmation email, generating a tax form).

[BAD]   Order DB  →  email: alice@example.com
[GOOD]  Order DB  →  email_token: tok_email_a8f3c1...
        Vault     →  tok_email_a8f3c1 ↔ alice@example.com  (encrypted)

The application database no longer needs to be secured to PCI / HIPAA / GDPR standards because it doesn't hold regulated data. The vault does — once.

System design

                  ┌────────────────────────────────────────┐
                  │  Application Tier (no PII at rest)     │
   User submits   │                                        │
   email + SSN ──►│   ┌─────────────┐                      │
                  │   │ API Gateway │                      │
                  │   └──────┬──────┘                      │
                  │          ▼                             │
                  │   ┌──────────────┐    ┌─────────────┐  │
                  │   │ Order        │───►│ Order DB    │  │
                  │   │ Service      │    │ (tokens     │  │
                  │   └──────┬───────┘    │  only)      │  │
                  │          │            └─────────────┘  │
                  └──────────┼─────────────────────────────┘
                             │  POST /tokenize
                             │  POST /detokenize
                             │  mTLS  +  scoped JWT
                             ▼
                  ┌────────────────────────────────────────┐
                  │  Vault Tier (PII boundary)             │
                  │                                        │
                  │   ┌──────────────┐                     │
                  │   │ Token API    │                     │
                  │   │ + Authz      │                     │
                  │   └──────┬───────┘                     │
                  │          ▼                             │
                  │   ┌──────────────┐    ┌─────────────┐  │
                  │   │ Encrypted KV │    │ Audit Log   │  │
                  │   │ (token →     │    │ (every      │  │
                  │   │  ciphertext) │    │  access)    │  │
                  │   └──────┬───────┘    └─────────────┘  │
                  │          │ envelope-encrypted DEKs     │
                  │          ▼                             │
                  │   ┌──────────────┐                     │
                  │   │  KMS / HSM   │                     │
                  │   └──────────────┘                     │
                  └────────────────────────────────────────┘

   App stores:    order_id, customer_token, email_token, ssn_token
   Vault stores:  token → encrypted PII, with full audit trail
   KMS holds:     master keys; per-tenant DEKs are envelope-encrypted

Tokenize is the hot path on the way in. Detokenize should be rare and deliberate, scoped per-field. "Send the confirmation email" gets email access. "Generate the W-9" gets ssn access. Nothing gets both unless it really needs both, and every call is audited.

Token design choices

A few decisions shape what you can actually do with tokens.

Deterministic vs random. Deterministic tokens (same input always produces the same token) let you join, dedupe, and search across services without ever detokenizing. Random tokens are stronger privacy but break those flows. Most production vaults support both per-field — random for SSN, deterministic for email so analytics can still count unique users.

Format-preserving vs not. A 16-digit card tokenized to a 16-digit token slots into existing column types — convenient, but a leaked token looks like a valid card. Most teams skip format preservation and use prefix-tagged tokens (tok_card_...) so a leak is obviously inert.

Scope. A token issued for the customers domain shouldn't be detokenizable using marketing domain credentials. Per-domain encryption keys plus per-caller IAM keep the blast radius tight even within the vault.

No client-side caching. Tempting for latency, fatal for compliance — every detokenized value should live only in memory for the duration of the request, then go.

What it actually buys you

Smaller blast radius. A breach of the order database leaks tokens, not PII. The incident response becomes "rotate vault keys" instead of "notify every customer."
Crypto-shredding. When a user invokes their right to be forgotten, you delete their per-user data-encryption key. Every token referencing them becomes mathematically useless — without ever touching the application databases.
One audit trail. Every read of every PII field is logged in one place. Compliance reviews stop being archaeology across a dozen services.
Centralized rotation. Key rotation, encryption upgrades, algorithm changes — they all happen in one service. Apps see no diff.

What it costs you

It's not free. Every write of a PII field becomes a network round-trip. Every detokenize call adds latency. The vault is now on the critical path for some flows, which means it needs the same uptime engineering as your primary databases. And if you ever want to do ad-hoc analytics or fuzzy customer search against PII, you have to design for it up front — vector indexes on encrypted blobs are not a thing you bolt on later.

The right question is which fields are worth this cost. SSN, full PAN, government ID, bank account — always. Email and phone — usually. Display name — probably not. Tokenize the data whose leak would make the news.

The lesson

The strongest defense against losing PII is not having it in the first place. The vault pattern is just the practical version of that principle — push the regulated data into one well-guarded place, give every other service tokens, and let the blast radius shrink to a single, audit-friendly boundary.

When the post-mortem of a breach can be summarized as "they got tokens", you've done your job.