Why your finance app lies to you about your spending

By Daniil Kozin · ex-banker · Florianópolis

Your finance app is not malicious. It is just optimistic. It tells you your budget is on track because the numbers under the categories add up neatly, and it does not tell you about the three things it had to round, hide, or guess to keep those numbers neat. I have spent two years inside the codebase of one of these apps, and a decade as a user of the others. This is what I would tell a friend.

I built fourteen versions of a money tracker for myself before settling on the fifteenth, which is what Capi is now. I shipped to a few thousand users, watched real spending data flow in, and read enough Reddit threads about Copilot, Monarch, YNAB, Mint, and Rocket Money to fill a small book. The three lies in this essay are the ones I see most often, on the apps I respect and the one I built. They are not deal-breakers. They are just things the dashboard will not tell you. Once you know they are there, you can stop being surprised by them.

Why does my finance app keep putting transactions in the Other category?

The Other category is where every finance app dumps transactions it cannot confidently classify. Copilot Money reports about 93 percent first-pass accuracy, which sounds high until you do the math: on 200 transactions a month, 14 of them land in Other. Most users glance at the totals, never drill into Other, and assume their budget is accurate when roughly 7 percent of their spending is invisible to the chart.

Copilot has a written help-center article called Other Category. It explains that the bucket exists for transactions the model is not sure about, and notably, it cannot be deleted. You can rename it. You cannot remove it. That detail tells you everything: the app's designers know unclassified transactions are inevitable, so they made a permanent home for them. The accurate framing would be: this number is your auto-categorization failure rate, expressed as dollars. The dashboard framing is: this is just another spending category, like Groceries.

What this hides in practice. Suppose you spend US$ 3,000 a month, distributed across 200 transactions. At Copilot's claimed 93 percent first-pass accuracy, that is 14 transactions in Other. If those transactions average US$ 30 each (a realistic mid-spend), you have US$ 420 of monthly spending sitting in a bucket that you probably scanned for five seconds. Over a year, US$ 5,040 of your spending is essentially uncategorized. Rocket Money is more transparent than most about this; its help docs explicitly note the algorithm is constantly improving but not perfect, and Premium adds custom rules so you can force routing.

Why do refunds make my finance app overstate my income?

Most finance apps treat a refund as a positive transaction and route it into Inflow or Income by default. YNAB's own documentation warns against this because it inflates Ready to Assign and lies to you about how much new money you actually have. The honest treatment is to reduce the original category's outflow, not add to income. Almost no app does this automatically.

This is the lie I find most frustrating because it is the easiest to fix at the data-model layer and almost nobody fixes it. The mechanics: you buy a US$ 200 coat on a Tuesday. You return it the next Tuesday. Most apps see the second transaction as a positive number landing in your card account, default-categorize it as Income or Inflow, and now your Income bar looks healthier than it really is while your Clothing category still shows US$ 200 of spend that no longer exists. YNAB's published guidance is explicit: enter the refund as a negative outflow in the original category, not as positive Inflow. Their own docs say if you mishandle this, your reports fall out of sync with reality.

The reason apps do this badly is that the bank feed gives them a positive number on the credit card account and nothing else. There is no semantic link between the original purchase and the refund seven days later. The app would have to either ask you (manual confirmation), match by merchant and amount within a window (probabilistic and prone to error on subscription credits), or default-route to Inflow and let users fix it (which they almost never do). Most apps choose the third path. The chart looks tidy. The truth is hidden.

How accurate is automatic transaction categorization in 2026?

Copilot Money claims around 93 percent first-pass accuracy on a per-user trained model, which is currently best in class. Monarch and Rocket Money sit lower, with Rocket explicitly admitting the algorithm is constantly improving but not perfect. The remaining 7 to 15 percent of misses are not random; they cluster on cash, foreign currencies, peer transfers, and anything that does not look like a retail card swipe.

The 93 percent number is real and well-earned. Copilot's Categories FAQ explains the model is per-user and improves with corrections; after a month of training, accuracy gets high for most users. That is a meaningful product win and worth crediting. The lie is not the 93 percent. The lie is in how the dashboard presents the result: as if the chart is the truth, when 7 percent of the truth is sitting in a permanent bucket called Other.

The clustering matters more than the average. The 7 percent miss rate is heavily skewed toward Venmo, Zelle, Wise, foreign-currency charges, ATM withdrawals, peer-to-peer Brazilian Pix transfers, anything not on a US retail rails. If you live mostly on Chase or Amex in dollars, your personal accuracy is closer to 97 percent and the Other bucket is small. If you live between three currencies, send rent over Wise, and use Pix or peso-blue more than cards, your personal accuracy is closer to 70 percent and the Other bucket is structurally large. The app does not tell you which user you are.

What happened to Mint and why does that matter for the apps that replaced it?

Intuit shut Mint down on March 23, 2024 and pushed users to Credit Karma, which kept account aggregation but dropped budgeting, category trends, and bill tracking. The lesson is not about Mint specifically. It is that a finance app built on a bank-feed integration can be turned off by a corporate decision, and your three years of categorized history goes with it.

I used Mint from 2017 to 2022 and stopped using it well before the official shutdown, because the auto-categorization had quietly gotten worse and the rules I had written did not survive a UI refresh. When the shutdown was announced in November 2023, I was not surprised. I was already drifting. What I did not expect was Intuit telling everyone to migrate to Credit Karma and quietly omitting that Credit Karma is a credit-score product, not a budget tool. Account balances and three years of transactions transferred. Monthly budgets, category trends, bill view, all gone.

The plain read on Mint: it had been free for so long that users mistook free for permanent. It was actually free because Intuit was monetizing the data and the credit-card lead generation, and when that business stopped working at the scale Intuit needed, the product ended. Every replacement app (Monarch at US$ 99.99/year Core or US$ 199/year Plus, Copilot at US$ 13/month or about US$ 95/year, YNAB at US$ 109/year) charges money because someone has to. The lie was not Mint's quality. The lie was the implicit promise that a free product would still exist next year.

What does my finance app get right despite the lies?

The bank-feed pull. Auto-importing card transactions is genuinely useful and the right base layer. About 60 to 75 percent of spending for a normal user lives on connected cards and gets pulled in cleanly. The lie is not the pull. The lie is the dashboard pretending the other 25 to 40 percent (cash, peer transfers, refunds, foreign currency, anything pre-feed) does not exist or is safely in Other.

Crediting the apps where credit is due. Copilot's per-user trained model really is the best categorizer I have used. Monarch's UX for joint-account households is genuinely better than its competitors. YNAB's envelope philosophy is the most rigorous budgeting framework that has shipped in the last decade. The bank-feed integration via Plaid, MX, and Finicity is hard infrastructure and the fact that any of these apps work at all is a credit to the people who built that layer.

What none of them do is tell you the structural limits of the pull. They tell you what is in the chart. They do not tell you what is missing from it. That is the gap I tried to close with Capi, not because I have a better algorithm (I do not), but because chat is a different surface where the gap is harder to hide.

The minimum honest dashboard would show four numbers: total spend on connected accounts, total spend in Other, total spend you entered manually, and total spending you suspect is missing entirely (the gap). I have seen approximately zero apps show all four. Copilot shows the first two. Capi shows the first three. The fourth is the one no app can know, because it is the thing it did not see.

Should I just keep using a spreadsheet then?

Most people who tell me they switched back to spreadsheets did it for the wrong reason. A spreadsheet does not categorize better than Copilot; it categorizes differently because you do it manually. The right answer is not to abandon apps but to learn what your app is hiding: open the Other bucket monthly, audit refunds quarterly, and tag anything in cash with the same care you give the auto-imported rows.

The spreadsheet refugees I respect are not the ones who think Sheets is more accurate. They are the ones who think the act of typing each row is the budget. They are right about that. The friction of manual entry is what creates awareness. The chart's job is then to confirm what you already noticed. Apps with full auto-import remove the friction and therefore remove the awareness; you do not notice your spending because you never see the individual rows. That is a real loss, even if the totals are technically more accurate.

The middle ground I landed on after fourteen failed apps and a few thousand Capi users: chat-first manual entry, plus optional bank-statement reupload for sanity check (see the reupload test methodology). Type the transaction in the chat the moment you spend. The dashboard is built from the chat. The chat is the source of truth. The dashboard is a derivative view. When something looks off in the dashboard, you scroll the chat and the messy reality is right there. There is no Other bucket because every transaction came from you.

How does Capi handle the Other-bucket and refund problems?

Capi does not pretend to solve auto categorization at 99 percent. It does two things differently. First, it surfaces every uncategorized transaction in the daily chat, so nothing hides in Other for 30 days. Second, refunds are entered as negative outflow into the original category, not income, because the entry is a chat message and you can write minus instead of plus. Manual chat-first is the trade for honesty.

The Capi pitch in plain terms. It runs inside Telegram. You type "lunch 18" and the bot categorizes, confirms, and stores. If the category is wrong, you correct it inside the chat (one tap on the suggestion buttons). The dashboard at /spend shows monthly totals, but the chat above it is what you trust because you wrote it. The free tier covers 30 transactions/month. Core is US$ 9.90/month or US$ 69.90/year, which is meaningfully cheaper than Monarch Core (US$ 99.99/year) and YNAB (US$ 109/year) and closer to Copilot per dollar of value.

Capi Together (US$ 99/year for the household) adds a shared chat where partners both post transactions and both see them in real time. The same principle applies: no Other bucket, refunds entered as negative outflow on the original category. Plain pitches over slick dashboards, because the dashboards I have audited inside other apps were never as accurate as their users thought. See the 2026 money tracker comparison for where Capi fits next to Copilot and YNAB.

Which lie should I worry about first?

The lie	Apps it affects	Severity	Fix this month
Other bucket	All auto-feed apps	High	Open Other every Sunday; recategorize all rows
Refund as income	YNAB unless you fix manually, most others by default	Medium-High	Enter refunds as negative outflow on original category
Auto-categorization confidence	Copilot, Monarch, Rocket, Mint-era apps	Medium	Audit one full week of transactions manually each month
Hidden gap (untracked entirely)	Bank-feed apps without manual entry	High	Track cash, Wise, Pix, peer transfers manually for one month

FAQ: what your finance app does not tell you

Why does my finance app keep putting transactions in the Other category?

Why do refunds make my finance app overstate my income?

How accurate is automatic transaction categorization in 2026?

What happened to Mint and why does that matter for the apps that replaced it?

What does my finance app get right despite the lies?

Should I just keep using a spreadsheet then?

How does Capi handle the Other-bucket and refund problems?

Track money inside Telegram, with no Other bucket and no refund-as-income trick.

Type the expense. Capi categorizes, confirms, and stores. Free tier 30 transactions/month. Core US$ 9.90/month, Together US$ 99/year for the household.

Try Capi free on Telegram →

Written by Daniil Kozin, founder of Capi. More from this series: Best money tracker 2026 · Confessions of an indie finance app builder · Why my budget app duplicates transactions · Bank statement reupload test · Capi vs Monarch · Capi vs YNAB.