Text vs tap: why typing "coffee 4.50" beats opening an app
Every expense tracker asks for six taps. Capi asks for a sentence. Here's why that one design decision changes whether you still log anything three weeks from now.
I've watched a specific thing happen for years now, first in a bank relationship team, then with friends, then with my own phone. Someone installs a thoughtful expense tracker on a Sunday evening. They tag a few transactions. They feel, briefly, like a person in charge of their life. By Wednesday the app is a small guilt icon on screen three. By the end of the month it's uninstalled.
The app wasn't bad. The categories were sensible. The charts were beautiful. What broke wasn't the product's logic — it was the friction between noticing a spend and recording a spend.
The six-tap problem
Imagine you just paid for coffee. To log that in a normal tracker, you usually have to:
- unlock your phone,
- find the app icon (often in a folder you made on Sunday),
- wait for a splash screen,
- press "+ Add expense,"
- type the amount,
- pick a category from a dropdown,
- pick a payment method,
- sometimes pick a date,
- press Save.
Nine steps. Four or five of them are you proving to the app that you're still paying attention. And all of this happens while the barista is handing you your drink and the next person in line is already ordering.
So you don't do it. You tell yourself you'll log it later. You don't. Three weeks later you look at the app's empty dashboard and feel the same flavor of shame you felt about the gym membership.
The problem isn't laziness. The problem is that noticing is a different cognitive mode than tagging.
What happens when the interface is a sentence
Capi is built around one observation: you already know how to text. You do it hundreds of times a day. Your thumbs are pre-trained. The keyboard is already open half the time. There is no app to find, no category to pick, no "which account?" modal.
You open Telegram, which you already had open for something else, and
you type: coffee 4.50. That's it. Capi replies in a
breath with the category it guessed and the amount in your reference
currency. If it got the category wrong, you tap one inline button to
fix it — and Capi remembers that correction for next time.
The interaction isn't minimal because we ran out of ideas. It's minimal because the only way you'll still be logging transactions in month three is if logging one costs you less than the act of remembering you should log it.
The cost of "just one more tap"
Product people talk about friction like it's a slider you turn down. It isn't. Friction compounds. Each extra tap isn't just three seconds — it's three seconds and a 5% chance you close the app instead. String enough of those together and you get the honest answer to the question "why didn't you log it?": because I was going to, and then I wasn't.
What makes the text-first approach survive Wednesday isn't that it's faster by the stopwatch. It's that it never competes with your attention. You don't have to switch modes. You don't have to remember what category the gym falls under this month. You already had the thought "I just spent 4.50 on coffee" — you just let it land as a message instead of forgetting it.
What you trade for the simplicity
I'll be honest: a text-first tracker is worse than a form-based one at exactly one thing, which is entering a transaction you weren't going to enter anyway. If your instinct is to sit down on Sunday and reconstruct the week from your bank statement, Capi's receipt-photo mode and CSV upload cover that — but the product's center of gravity is the short message you send ninety seconds after spending.
Which is, I'd argue, the only interaction that actually builds the habit. Everything else is an audit.
Try the sentence.
Open Telegram, say hi to Capi, and text your next coffee.
Start free on Telegram →