Can I use ChatGPT as a personal finance tracker?

You can talk to ChatGPT about your spending, but it cannot function as a real tracker. ChatGPT does not reliably persist your transaction history across sessions, cannot run scheduled digests, will hallucinate totals on long conversations, and has no structured database to query. A dedicated tracker keeps data in a persistent store, runs on a schedule, and stays consistent. Capi is a chat-native tracker on Telegram that does this.

What about ChatGPT's memory feature — can't I just use that?

ChatGPT's memory is a small summary the model maintains about you. It is not a ledger. It was not designed to store hundreds of individual transactions with amounts, dates, currencies, and categories. Ask ChatGPT for last month's grocery total and it will give you a plausible-sounding number that does not come from an actual database.

Why is a dedicated tracker better than an LLM for money?

Six reasons: persistent structured storage, deterministic math at scale, accurate multi-currency conversion with live FX, consistent category taxonomy, genuinely private local data, and scheduled outbound nudges. LLMs are great at parsing the input; they are bad at the six things that make a tracker actually work.

← Blog · April 20, 2026 · 8 min read

Essay

Why ChatGPT is worse than a real tracker for personal finance

It feels like ChatGPT should be able to track your spending. It has memory now. It can do math. It speaks your language. And yet — the moment you try to use it as your actual ledger, it falls apart in six specific, predictable ways.

I tried it. For three weeks I logged every expense to a single chat with ChatGPT, using the memory feature, asking for totals, asking for breakdowns. It was seductive for about four days and broken by day ten. Here is exactly where it breaks, and why a dedicated tracker — any dedicated tracker, not just the one I'm building — will always win for this job.

1. ChatGPT has no real ledger

ChatGPT's "memory" is a small, fuzzy summary the model maintains about you. It was designed to remember that you prefer terse answers or that you have a dog named Olive. It was never designed to hold a list of three hundred transactions, each with an amount, a date, a currency, a merchant, and a category.

What it actually stores, for a user who's been logging money, is something like: "User tracks expenses. Recent items include coffee, groceries, rent." That's not a ledger. That's a vibe.

Ask it "what did I spend on groceries in March?" and it will produce a number. The number will sound plausible. It did not come from a database — it came from the model's best guess at what such a number might be given the vibe. On day four this feels like magic. On day fourteen, when you catch a 30% error on a real number, the whole edifice falls down.

2. The math gets worse as the conversation gets longer

LLMs do arithmetic by predicting the most likely next token, not by calculating. On short lists they get it right — the pattern is obvious. On long lists they drift. By the time you have a few hundred transactions in a chat's context, asking for a sum is a gamble.

Worse: the gamble is silent. ChatGPT won't tell you "I'm not sure about this total." It will hand you $1,247.30 with a confident tone and a breakdown by week. You have no way, inside the chat, to audit it.

A dedicated tracker runs SELECT SUM(amount) FROM transactions WHERE category='groceries' AND month='2026-03'. That's it. The answer is either right or a clearly-wrong error. There's no middle ground where it hallucinated a believable number.

3. Multi-currency is a disaster

Try this with ChatGPT: log "40 euros at the cafe" on Tuesday, "50 dollars for gas" on Friday, "1,200 pesos for groceries" on Saturday. Now ask it: "how much did I spend this week in dollars?"

You'll get an answer. It will be wrong. Either the model used a rate from its training data (eighteen months old), invented a rate that sounds right, or — on a good day — refused to try and told you to check a converter. None of these are useful when you live between countries.

A real tracker hits a live FX rate at the moment you log, stamps the converted amount alongside the native one, and never has to re-convert. The week total is correct because every row was already normalized when it was written.

This is the friction that kills finance apps for expats, freelancers, and anyone paid in one currency and spending in another. The real product isn't the graph — it's the fact that the graph is correct when you're spending across three currencies in a single week. Capi handles USD, EUR, GBP, BRL, RUB, ARS, and any pair reachable via bank FX. It's the default, not a Premium upsell.

4. The category taxonomy drifts on every turn

ChatGPT calls your supermarket run "groceries" on Monday, "food" on Tuesday, and "supermarket" on Friday. Ask for "food spending" and it might give you Tuesday's line. Ask for "groceries" and you might get Monday's. Ask for "supermarket" and you get Friday's. The model doesn't have a fixed list of categories it adheres to — it uses whatever word seemed natural given the surrounding conversation.

This is fine for casual talk. It's catastrophic for budgeting. The whole point of categories is that they're stable containers you can compare week over week. A taxonomy that shifts based on how the question was phrased is not a taxonomy.

A dedicated tracker has a fixed category schema. When Capi sees "groceries", "food shopping", and "supermarket", they all route to the same category slug (food_drink → sub-category groceries). Next month's report compares to last month's report on the same axis.

5. No scheduled outbound nudge

This is the one that surprised me most when I thought about it. ChatGPT cannot, structurally, message you on Sunday morning with a digest. It has no cron. It has no outbound capability. It waits for you to open the app and start a conversation.

The single most valuable moment in personal finance software isn't the moment you log an expense. It's the scheduled, unprompted moment once a week where something tells you "here's where the money actually went, and here's what shifted." That moment is when behavior changes. Without it you're logging into a void.

A dedicated tracker runs on a schedule. Capi sends you a short digest every Sunday morning — total, top categories, month-over-month drift, one gentle nudge. You didn't have to open anything. That's the product.

6. Privacy model is wrong for ledger data

Your ChatGPT conversations, by default, can be used to train future models. There's a toggle to turn it off, but most users don't. And even with the toggle off, the conversation sits on a server you don't own, under a TOS that can change, inside an account that's tied to a work or personal email that might be shared.

A spending ledger is one of the more intimate data sets you'll ever create. Every coffee, every drugstore run, every restaurant, every drugstore. Putting it in a chat designed for general-purpose questions means it's commingled with everything else you've ever asked the model — and subject to whatever policy that company has at any given moment.

Capi stores each user's transactions in a per-user SQLite file. No ads. No third-party analytics on the transaction data. Telegram handles authentication. There's a /delete_me command that wipes everything on request. This isn't a nice-to-have for a money tracker. It's the baseline.

What an LLM actually is good at here

The point of this piece isn't that LLMs are useless for finance. They're great at parsing the input. Capi itself uses an LLM to extract structured transactions out of a line like "forty-five bucks on groceries at mercadona". That's exactly the job LLMs do well: fuzzy, underspecified natural-language input → structured output.

What LLMs are bad at is being the database. The right architecture is LLM-as-parser on top of a real structured store, a real scheduler, a real FX engine, and a real category schema. That's what a tracker is. That's not what ChatGPT is.

The seductive thing about ChatGPT for money is that it can hold the whole conversation in one place. The trap is that a ledger isn't a conversation. It's a record.

The summary, if you're in a hurry

No real ledger. ChatGPT memory is a vibe, not a database.
Math drifts with length. Confident wrong numbers at scale.
Multi-currency breaks. No live FX, no per-row conversion.
Category taxonomy slips every turn. Non-comparable reports.
No scheduled nudge. The Sunday digest doesn't exist.
Privacy model is wrong. Ledger data belongs in a dedicated store.

If any of this resonates — if you've noticed the drift yourself, or you've stopped trusting the numbers — try a tool that was built for this specifically. I'm biased: I built Capi, a conversational tracker that lives inside Telegram. You type or say what you spent, Capi handles the rest. The parts ChatGPT is good at (parsing your message) are LLM-powered. The parts ChatGPT is bad at (the ledger, the math, the schedule, the privacy) run on a proper structured stack underneath.

You tell Capi what you spent. Capi remembers.

A calm capybara on Telegram. Text, voice, or a photo of a receipt.
Seven languages. Multi-currency by default. Free to start.

Meet Capi on Telegram →

Written by Daniil Kozin, founder of Capi. Also in this series: Text vs tap — the 0.8-second tax on every expense.