Skip to content

Building a robust cash grant portal

Building a fault tolerate national cash grant system.

When Guyana’s National Cash Grant portal went live yesterday, May 26, 2026, it immediately crashed and burned from laughably avoidable load and data integrity failures. Frustrated by the mess, I took an hour to map out what a genuinely resilient architecture for this should actually look like.

Disclaimer: What follows is a conceptual model, not a ready-to-ship deployment blueprint. To stress-test the concept, I’m assuming a massive 950,000 concurrent users; a figure that wildly overshoots the actual eligible population. The encryption strategies require dedicated security audits, and any real-world deployment demands rigorous load testing, strict coordination with actual banks, and legal clearance. If you’re building a system like this for production, hire seasoned engineers, security professionals, and legal counsel. Do not just copy-paste from a blog post.

TLDR;

Guiding principles behind this architecture.

High-level Overview

The architecture splits into three zones:

All three sit within the local data center boundary. The three-zone split is the core architectural decision. The registration zone faces the public internet and is designed for massive concurrent throughput. The processing zone is internal-only; admin staff, batch jobs, and notification services. The bank exchange zone is an isolated SFTP gateway that connects to each commercial bank. No bank has direct access to the registration system or the database; they only see the encrypted batch files placed on the SFTP server.

Registration flow

This is considered a critical path. We’re assuming it must handle 950k registrations without crashing. Our guiding principle is to accept fast, process later.

high-level-architecture

This path has seven layers, each absorbing from the one above.

The CDN handles approximately 70% of all requests by serving cached static assets. Of the remaining dynamic requests, the WAF and rate limiter filter out malicious traffic and throttle abusive IPs. The citizen’s browser encrypts banking data client-side before submission ensuring plaintext never touches the network.

The API gateway validates the payload (schema check, national ID format, idempotency key for duplicate prevention), and immediately returns HTTP 202 with a tracking ID. The citizen sees “Registration received” within 1-2 seconds. Behind the scenes, the submission is placed on the message queue, which can buffer up to 1 million messages. Write workers drain the queue at a pace the database can sustain (~2,000 writes/second), smoothing out any traffic spikes.

Capacity Planning

To prevent crashes we need four mechanisms working together:

With these in place the system will degrade gracefully instead of falling over, as it did yesterday.

Total registrationsPeak concurrentPeak requests/secCDN absorbsAPI instancesQueue capacityDB writes/sec
950,000~200k (4hr surge window)~15k (static + dynamic)~70%8 (stateless, auto-scale)1M msgs~2k (Queue-smoothed)

Batch disbursement pipeline

Most banks in the Caribbean do not expose a disbursement API so we must build a disbursement engine that runs on a schedule, generates one payment file per bank, encrypts each file with the bank’s PGP key, and uploads it via SFTP.

The pipeline runs in a strict sequence:

Reconciliation

After the bank processes the file (typically overnight), they deposit a response file. The reconciler picks it up, parses it, and updates the citizen’s status.

The reconciler polls each bank’s SFTP return directory every 30 minutes. When a response file appears, it verifies the PGP signature (confirming the file came from the bank, not a tampered source), then parses each row.

Successful payments update the citizen’s record to payment_status = paid and trigger an SMS or email confirmation. Failed payments (invalid account, dormant account, name mismatch) are flagged for administrative review. The citizen receives a notification to update their banking details through the portal, which re-enters them into the next batch cycle. Common failure reasons and their resolution paths should be clearly defined in the admin interface so staff can resolve them quickly.

Encryption

I’m borrowing this concept from a payment gateway I built for a client that processes hundreds of thousands of transactions a month and must comply with strict regulations.

Layer 1 (E2E) protects banking data from everyone, including the government’s own system administrators, developers, and database operators. The browser encrypts the bank account details before submission. The encrypted blob travels through the API, queue, and into the database without any server-side component ever seeing plaintext. Only the HSM can decrypt, and only during the batch file generation step in a secure enclave.

Layer 2 (Database TDE + column-level AES) protects against unauthorized access to the database, SQL injections, a compromised DBA credential, or a leaked backup. Even with query access, they see ciphertext for sensitive fields like names and phone numbers.

Layer 3 (full-disk encryption) protects against physical theft. If someone walks out of the data center with a disk, the data is unreadable without the encryption keys, which are stored separately in the HSM.

The batch model adds a fourth encryption: the PGP layer on the payment files. Each bank has its own PGP key pair, so even if an SFTP upload were intercepted, the file is readable only by the intended bank.

Data model

This is the final piece of the puzzle that describes what gets stored in the database. The banking table stores only the E2E encrypted blob and a reference to which HSM key was used. The disbursement log tracks every batch file sent and every response received, creating a complete paper trail.

Each citizen has exactly one set of banking details (the E2E encrypted blob). When a batch runs, each citizen gets a row in the disbursements table linked to the batch file that included them. Each batch file maps to one bank and may have zero or one response file (zero if the bank hasn’t responded yet, one after reconciliation).

Key modelling decisions in the schema:

The national_id_hash is a one-way SHA-256 hash used for deduplication. We can check “has this person already registered?” without storing the actual national ID in a queryable field. The bank_code on the banking details table is stored in cleartext because the batch engine needs it for routing (which bank gets which file) without decrypting the blob.

The BATCH_FILES and RESPONSE_FILES tables create a complete audit chain: every file generated, every file received, every row’s outcome. The file_hash column stores a SHA-256 digest of the file contents at generation and receipt time, so any post-hoc tampering is detectable.

The payment_status enum on disbursements tracks the lifecycle: