How LLMs Actually Handle Numbers
Here is the single most important thing to understand before using AI for any financial work: language models do not do math the way a calculator does. They predict text. When an LLM tells you that 23% of $4.2M is $966,000, it isn't calculating — it's generating the most statistically likely sequence of tokens that looks like an answer. Sometimes that's correct. Sometimes it's confidently, catastrophically wrong. This module explains exactly how this works and how to get reliable numerical output.
Why a language model is not a calculator
A calculator executes arithmetic deterministically — the same inputs always produce the same correct output because actual computation happens. A language model generates text by predicting likely token sequences based on patterns in its training data. When you ask it to multiply two numbers, it's drawing on having seen millions of examples of arithmetic in text — not performing the operation.
For small, common calculations the model has effectively memorized the patterns, so it's usually right. For larger or more unusual numbers, the prediction breaks down. The model might produce an answer that's off by an order of magnitude, transpose digits, or simply invent a plausible-looking result.
Imagine someone who has read every math textbook ever written but has never actually done arithmetic — they've only seen the worked examples. Ask them "what's 7 times 8?" and they'll instantly say 56 because they've seen it ten thousand times. Ask them "what's 4,847 times 392?" and they'll confidently produce a number that looks right but may be completely wrong, because they're recalling the shape of an answer rather than computing it.
The critical solution: tools that actually compute
The fix transformed AI's usefulness in finance. Modern AI tools can execute real code to do math, rather than predicting it. The most important is ChatGPT's Advanced Data Analysis (formerly Code Interpreter), which runs actual Python in a sandbox.
When you upload a spreadsheet and ask ChatGPT to calculate the compound annual growth rate, with Advanced Data Analysis enabled it writes Python code, executes it on your actual data, and returns the real computed result. This is fundamentally different from asking it to "estimate" the CAGR from the numbers in the chat — the former is real computation, the latter is text prediction.
How to force reliable numerical work
Ask a language model without code execution to compute a 30-year mortgage amortization, or to sum a column of 40 specific figures, and there's a real chance it produces a confident wrong answer. The output will be formatted perfectly — clean tables, professional language — which makes the error harder to catch, not easier. Polish is not accuracy.
The right division of labor
Once you understand this, the correct workflow becomes obvious. Use the language model for what it's genuinely good at — understanding context, structuring analysis, explaining concepts, drafting narrative — and use code execution or a real spreadsheet for the actual numbers.
| Give to the LLM's "reasoning" | Give to code execution / spreadsheet |
|---|---|
| Explaining what a metric means | Calculating the metric |
| Structuring a financial model's logic | Running the model's numbers |
| Interpreting what a ratio implies | Computing the ratio from raw data |
| Drafting the narrative around results | Producing the results themselves |
If a financial number an AI gives you would influence a real decision, either it came out of executed code on data you provided, or you verify it against a primary source. There is no third acceptable option. A predicted number is a guess wearing a suit.
Next
Now that you understand the numerical foundation, Module 3 covers how to use AI for market and company research — where the live-web tools become essential and where the training-cutoff trap catches careless users.