Sentinel: How We Automated Odoo UI Testing With a Vision Model Feedback Loop

At Keen360, we implement Odoo for businesses across the Caribbean. That means multiple client instances, each with their own modules, workflows, and configurations. Every sprint, every customization, every upgrade comes with the same bottleneck: someone has to sit in front of a browser, click through test cases, take screenshots, and write up the results.

Our QA process was eating time we didn’t have. So I built Sentinel.

The problem with manual testing

Manual testing against Odoo is tedious for reasons anyone who has done it will recognise. The tester opens a browser, logs in, navigates three levels deep into a menu, fills out a form, clicks save, checks the status bar, scrolls to the chatter, confirms a message appeared, then screenshots everything and pastes it into a report. Multiply that by forty test cases per sprint across multiple projects and you start to understand the scale of the problem.

The real cost isn’t just the hours. It’s the inconsistency. One tester screenshots before the page finishes loading. Another forgets to check the chatter. A third marks something as passed because it “looked right” without verifying the exact field value. When you’re rolling out an ERP solution for multiple clients simultaneously, that inconsistency becomes a liability.

I needed a system where test cases are written once, executed identically every time, and reported automatically with evidence attached.

The core idea: let a vision model be the tester

Most test automation frameworks require you to write selectors — CSS paths, XPaths, data attributes. That works fine for products you control end to end, but Odoo’s DOM is dense, dynamic, and varies between versions. Maintaining a selector-based test suite against a platform that rearranges its markup with every update is a losing game.

Sentinel takes a different approach. Instead of selectors, testers write plain English:

{ "type": "natural", "instruction": "Click on Account manager Corporate & Special Accounts" }

The runner takes a screenshot of the current browser state, sends it alongside the instruction to a vision model, and gets back a concrete Playwright action. The model looks at the page the same way a human would. It finds the element visually and tells the runner what to click.

This closes the gap between how humans describe testing (“click the thing that says First Interview”) and how automation frameworks need instructions (page.click('div.o_statusbar_status button:nth-child(3)')). The vision model is the translator.

Assertions that see what you see

At the end of every test case, the runner takes a final screenshot and sends it to the vision model along with the expected result; a plain English description of what should be visible on screen:

“Stage change appears in the chatter”

The model compares the screenshot against the expectation and returns a verdict: pass or fail, with a one-sentence description of what it actually saw.

This is where real efficiency gains lives. The assertion isn’t checking a DOM node. It’s checking what a human would check; is the right thing visible on the screen? That means our test cases survive Odoo upgrades, theme changes, and layout shifts that would break every selector-based assertion in a traditional suite.

Template steps for the predictable stuff

Not everything needs AI. Logging in, navigating menus, filling fields, clicking buttons — these follow predictable Odoo patterns that don’t change much between versions. For these, Sentinel provides template steps: deterministic Playwright actions that run without touching the vision model.

{ "type": "template", "action": "login" }
{ "type": "template", "action": "navigate_menu", "params": { "path": ["Recruitment"] } }
{ "type": "template", "action": "fill_field", "params": { "label": "Customer", "value": "Azure Interior" } }
{ "type": "template", "action": "assert_status_bar", "params": { "stage": "First Interview" } }

There are eleven template actions covering login, navigation, record creation, field filling, button clicking, and various assertions. They’re faster and more reliable than natural steps for anything that follows a known pattern. The strategy is simple: use templates for the scaffolding, natural steps for the parts that are genuinely unpredictable.

Architecture: two services, one shared API

Sentinel is split into two independent services. The web app runs on Cloudflare and handles everything a tester interacts with — projects, instances, sprints, test cases, runs, and results. The runner lives on a DigitalOcean droplet inside a Docker container, polling for work every five seconds.

The web app is a React Router v7 SPA on Cloudflare Pages, backed by D1 for structured data, KV for credentials, and R2 for screenshot storage. The runner communicates through a small internal API protected by both a shared secret and Cloudflare Zero Trust service tokens.

When a tester triggers a run from the UI, the runner picks it up on its next poll, launches Chromium, logs into the Odoo instance, and works through every test case sequentially. After each step, template or natural, it takes a screenshot and uploads it to R2. When the run finishes, it posts a summary to Discord.

Discord as the report layer

We already live in Discord for team coordination, so it made sense to put test reports there too. On completion, Sentinel posts a single message with a summary embed showing the project, sprint, duration, and pass/fail counts. Failed test cases get individual embeds with the expected result, what the model actually saw, and a link to the final screenshot.

Green embed means everything passed. Red means someone needs to look. The team sees it immediately without opening another dashboard.

The data model

The hierarchy is straightforward: projects own instances and sprints, sprints own test cases and test runs, and each run produces a result per test case with status, the model’s description of what it saw, and screenshot URLs.

All IDs are nanoid strings. Screenshots are stored as JSON arrays of presigned R2 URLs. Credentials never touch the database — they live in KV and are fetched fresh by the runner for every run.

What changed for us

Before Sentinel, a full regression pass across a client’s Odoo modules took a tester most of a day. Now the same suite runs in under an hour with no human involvement beyond the initial setup. The screenshots are taken at exactly the right moment, every time. The assertions check what actually matters, what’s visible on the screen, not what a brittle selector happens to match.

The team writes test cases faster because they write in English, not code. New testers contribute on their first day because there’s nothing to learn beyond “describe what you want to click” and “describe what you expect to see.” And when a test fails, the Discord report tells you exactly what went wrong, with a screenshot to prove it.

We’re still early. The vision model isn’t perfect, it occasionally misidentifies elements on dense Odoo forms, and complex multi-step interactions sometimes need to be broken into smaller pieces. But the direction is clear: natural language in, verified results out, evidence attached. That’s the QA workflow I wanted, and Sentinel delivers it.