When Guyana’s National Cash Grant portal went live yesterday, May 26, 2026, it immediately crashed and burned from laughably avoidable load and data integrity failures. Frustrated by the mess, I took an hour to map out what a genuinely resilient architecture for this should actually look like.

Disclaimer: What follows is a conceptual model, not a ready-to-ship deployment blueprint. To stress-test the concept, I’m assuming a massive 950,000 concurrent users; a figure that wildly overshoots the actual eligible population. The encryption strategies require dedicated security audits, and any real-world deployment demands rigorous load testing, strict coordination with actual banks, and legal clearance. If you’re building a system like this for production, hire seasoned engineers, security professionals, and legal counsel. Do not just copy-paste from a blog post.

TLDR;

Guiding principles behind this architecture.

Complete decoupling: Registration and disbursement are independent systems. We can process 950K users at full speed regardless of what the banks are doing. The batch engine runs on its own schedule. Neither can take the other down.
No bank API dependency: Every Guyanese commercial bank already supports SFTP file exchange for bulk payments. The batch model requires zero new infrastructure on the bank’s side; they receive the same kind of file they already process for payroll.
Graceful degradation: CDN caching, rate limiting, queue buffering, circuit breakers, and read replicas all work together to ensure the system slows down under extreme load rather than crashing. If the queue fills, citizens get a “try again in 5 minutes” response, not a 500 error.
Data sovereignty: All data is stored locally. The CDN caches static assets only. All databases, queues, HSMs, backups, and SFTP gateways are locally hosted, fully compliant with the Data Protection Act.
Defense in depth: Three encryption layers (E2E, database TDE, disk), PGP on the wire to banks, TLS 1.3 on all connections and mutual SSH key auth on SFTP. Compromising any single layer doesn’t expose citizen data.

High-level Overview

The architecture splits into three zones:

the public-facing registration zone (high traffic, real-time),
the internal processing zone (batch jobs, admin), and
the bank exchange zone (SFTP, scheduled).

All three sit within the local data center boundary. The three-zone split is the core architectural decision. The registration zone faces the public internet and is designed for massive concurrent throughput. The processing zone is internal-only; admin staff, batch jobs, and notification services. The bank exchange zone is an isolated SFTP gateway that connects to each commercial bank. No bank has direct access to the registration system or the database; they only see the encrypted batch files placed on the SFTP server.

Registration flow

This is considered a critical path. We’re assuming it must handle 950k registrations without crashing. Our guiding principle is to accept fast, process later.

high-level-architecture

This path has seven layers, each absorbing from the one above.

The CDN handles approximately 70% of all requests by serving cached static assets. Of the remaining dynamic requests, the WAF and rate limiter filter out malicious traffic and throttle abusive IPs. The citizen’s browser encrypts banking data client-side before submission ensuring plaintext never touches the network.

The API gateway validates the payload (schema check, national ID format, idempotency key for duplicate prevention), and immediately returns HTTP 202 with a tracking ID. The citizen sees “Registration received” within 1-2 seconds. Behind the scenes, the submission is placed on the message queue, which can buffer up to 1 million messages. Write workers drain the queue at a pace the database can sustain (~2,000 writes/second), smoothing out any traffic spikes.

Capacity Planning

To prevent crashes we need four mechanisms working together:

CDN - to absorb the bulk of the traffic
Rate Limiter - to prevent a DDOS
Message Queue - to decouple ingestion speed from database write speed
A circuit breaker - to throw a user friendly message if the queue exceeds its threshold

With these in place the system will degrade gracefully instead of falling over, as it did yesterday.

Total registrations	Peak concurrent	Peak requests/sec	CDN absorbs	API instances	Queue capacity	DB writes/sec
950,000	~200k (4hr surge window)	~15k (static + dynamic)	~70%	8 (stateless, auto-scale)	1M msgs	~2k (Queue-smoothed)

Batch disbursement pipeline

Most banks in the Caribbean do not expose a disbursement API so we must build a disbursement engine that runs on a schedule, generates one payment file per bank, encrypts each file with the bank’s PGP key, and uploads it via SFTP.

The pipeline runs in a strict sequence:

scheduler triggers the batch (daily at 6 AM is a sensible default but this can be configured).
The engine queries all registrations with status = approved and payment_status = unpaid. It requests HSM decryption of the banking blobs in a secure enclave. The plaintext bank account numbers exist in memory only for the duration of file generation, then are zeroed out.
The engine then splits the records by bank routing code. Each bank gets its own file in whatever format they accept. Each file is PGP-encrypted using that specific bank’s public key and signed with the government’s private key. The encrypted files are uploaded to each bank’s SFTP endpoint using mutual SSH key authentication. Finally, all affected records are marked payment_status = submitted in the database with the batch reference number for audit.

Reconciliation

After the bank processes the file (typically overnight), they deposit a response file. The reconciler picks it up, parses it, and updates the citizen’s status.

The reconciler polls each bank’s SFTP return directory every 30 minutes. When a response file appears, it verifies the PGP signature (confirming the file came from the bank, not a tampered source), then parses each row.

Successful payments update the citizen’s record to payment_status = paid and trigger an SMS or email confirmation. Failed payments (invalid account, dormant account, name mismatch) are flagged for administrative review. The citizen receives a notification to update their banking details through the portal, which re-enters them into the next batch cycle. Common failure reasons and their resolution paths should be clearly defined in the admin interface so staff can resolve them quickly.

Encryption

I’m borrowing this concept from a payment gateway I built for a client that processes hundreds of thousands of transactions a month and must comply with strict regulations.

Layer 1 (E2E) protects banking data from everyone, including the government’s own system administrators, developers, and database operators. The browser encrypts the bank account details before submission. The encrypted blob travels through the API, queue, and into the database without any server-side component ever seeing plaintext. Only the HSM can decrypt, and only during the batch file generation step in a secure enclave.

Layer 2 (Database TDE + column-level AES) protects against unauthorized access to the database, SQL injections, a compromised DBA credential, or a leaked backup. Even with query access, they see ciphertext for sensitive fields like names and phone numbers.

Layer 3 (full-disk encryption) protects against physical theft. If someone walks out of the data center with a disk, the data is unreadable without the encryption keys, which are stored separately in the HSM.

The batch model adds a fourth encryption: the PGP layer on the payment files. Each bank has its own PGP key pair, so even if an SFTP upload were intercepted, the file is readable only by the intended bank.

Data model

This is the final piece of the puzzle that describes what gets stored in the database. The banking table stores only the E2E encrypted blob and a reference to which HSM key was used. The disbursement log tracks every batch file sent and every response received, creating a complete paper trail.

Each citizen has exactly one set of banking details (the E2E encrypted blob). When a batch runs, each citizen gets a row in the disbursements table linked to the batch file that included them. Each batch file maps to one bank and may have zero or one response file (zero if the bank hasn’t responded yet, one after reconciliation).

Key modelling decisions in the schema:

The national_id_hash is a one-way SHA-256 hash used for deduplication. We can check “has this person already registered?” without storing the actual national ID in a queryable field. The bank_code on the banking details table is stored in cleartext because the batch engine needs it for routing (which bank gets which file) without decrypting the blob.

The BATCH_FILES and RESPONSE_FILES tables create a complete audit chain: every file generated, every file received, every row’s outcome. The file_hash column stores a SHA-256 digest of the file contents at generation and receipt time, so any post-hoc tampering is detectable.

The payment_status enum on disbursements tracks the lifecycle:

pending (approved, awaiting next batch)
submitted (included in a batch file)
paid (bank confirmed)
failed (bank rejected)
retrying (citizen updated details, re-entered queue).

Building a robust cash grant portal