June 21, 2026
Why a WhatsApp bot is a system, not a chat
The demo is deceptively simple: a message arrives, you call the model, you reply. Five lines. In production, those five lines become a system. Here’s why.
The “fast 200” problem
WhatsApp sends messages to your webhook and expects an immediate response. If you’re slow — and generating images with AI takes seconds — Meta assumes you failed and retries the same message.
The consequence? If your webhook generates the images on the spot, every retry fires another generation. You just paid two, three, four times for the same reply. In a service where each generation costs money, that bleeds you dry.
The fix is to separate two things that look like one:
- Acknowledge (return 200 in milliseconds) and queue the work.
- Do the work (generate) in a separate process that reads from the queue.
We use a queue (Cloud Tasks) that calls our own service back. The webhook never touches the AI: it only validates, enqueues and responds.
But queues retry too
You’ve solved Meta’s retries… and inherited the queue’s. Cloud Tasks is at-least-once: it can deliver the same task more than once. Do nothing and you’re back to square one: duplicate generations.
The defense is an atomic deduplication claim: before generating, you try to “claim” that
message_id in a database transaction. If it was already claimed, you stop. If not, you mark
it and continue. Atomic, because two simultaneous deliveries can’t claim the same thing.
Signature, authentication and the open endpoint
Your webhook is public: anyone on the internet can POST to it. You must validate the signature of each request (an HMAC with your app secret) and reject what doesn’t match. And the internal endpoint that does the paid work must require authentication (OIDC): leave it open and anyone can trigger generations on your dime.
The state machine that doesn’t hang
A conversation has phases: ask for a name, theme, style, generate, choose, buy. That’s a state machine. And it has to survive reality: what if the AI fails halfway? If the user sends a voice note where you expected text? If they come back three hours later? If they tap “generate” twice in a row?
Each of those cases is a branch that, if you don’t handle it, becomes a hung bot or a double
charge. The generating state is a lock; an AI failure reverts to a safe state; an
off-script message doesn’t advance the conversation.
And on top of that, limits
Because each generation costs, you need a rate limit that stops abuse without bothering normal use, checked transactionally before each round. Plus metrics to tune it with real data, not by guessing.
The takeaway
None of this shows up in the conversation. The customer just types “I want a gift” and gets designs. But underneath there are queues, transactions, signatures, authentication, a state machine and limits — the invisible scaffolding that separates a demo from something you can leave running unattended.
Building that once, well, is exactly the work a shop shouldn’t have to do. That’s why Taituri exists.
— The Taituri team