My 5 practical tips for reducing your Claude token usage by changing your workflow.

·May 9, 2026
Blago teaching students about Claude

The problem nobody wants to admit

I hit Claude's weekly token limit building a single landing page. One page. The kind of thing a decent developer knocks out before lunch.

At the time, I blamed the plan. Surely the limit was too low. Surely Anthropic was just trying to squeeze more subscribers. That felt like a reasonable take for about three days, until I started paying attention to what I was actually doing with those tokens and realised the problem was a lot less interesting than corporate greed. It was just bad workflow.

It took me an embarrassingly long time to land on the actual diagnosis. Most people don't hit Claude limits because of the plan. They hit limits because of bad workflow.

So here are five workflow fixes, with some extra context from someone who learned several of them the expensive way.

1. File management

Your PDF's are eating your budget.

The first and most underrated fix is what you feed Claude before it does anything else.

PDFs are the biggest culprit. A PDF is not just text. It's a container format packed with layout instructions, embedded fonts, image data, and metadata that Claude has to process before it can get to the actual words you care about. Upload a 20-page PDF and you're spending a significant chunk of your context window just on overhead.

The fix is to convert PDFs to markdown before uploading. Tools like Pandoc, online converters, or even a quick copy-paste into a text editor can strip all of that structural noise and leave you with clean, token-efficient content. Markdown files are dramatically cheaper to process because they're just text with lightweight formatting signals. No hidden layers, no rendering instructions, nothing Claude needs to decode.

The same logic applies to any file you're uploading. Keep them lightweight. Trim what isn't relevant to the task. If you're asking Claude to review a specific section of a document, paste that section instead of uploading the whole thing and telling it to focus on page seven.

Better formatting equals lower token cost. That's not a tip. That's just how the thing works.

2. Chat is cheap, code is not

Plan before you build. This one sounds obvious once you hear it. It rarely gets followed in practice.

Claude Code is a powerful tool, and it burns context fast. Every file it reads, every change it makes, every error it encounters and tries to recover from, adds to the running total. Starting Claude Code without a clear plan is like running a taxi meter while you sit in the driveway arguing with yourself about where you want to go.

The smarter move is to do all of your strategic thinking in Claude Chat first. Chat is cheap. It's designed for back-and-forth reasoning without the overhead of a full agentic coding environment. Use it to figure out your architecture, your approach, your edge cases. Ask it to poke holes in your plan before a single line of code gets written. Only once the strategy is settled do you bring in Claude Code or Cowork to actually execute.

Think cheap. Build expensive. In that order.

This is not about being precious with tokens. It's about the quality of what you produce. Agentic tools perform noticeably better when they're given a clear target instead of being asked to figure out the target and hit it simultaneously. The planning phase is not dead time. It's the work that makes the execution work.

3. Ask questions first

There's a very common pattern with AI workflows that looks productive and isn't. You write a prompt, Claude produces something, you realise it missed what you actually needed, you write a correction, it adjusts, you realise the adjustment broke something else, you write another correction. By the end of it you've spent three times the tokens and you're on version four of something that should have been right the first time.

The root cause is almost always that the first prompt didn't give Claude enough to work with.

Before you ask Claude to build or write or analyse anything, spend a moment setting up the context properly. Tell it your goal, not just the task. Define what success looks like. If there are constraints it needs to respect, list them upfront. If there are things you specifically don't want, say so before it goes off and does them enthusiastically.

Most importantly, if the request is complex, ask Claude to clarify before starting. This sounds like it costs time. In practice it saves a significant amount of it, because the alternative is discovering the misunderstanding three exchanges in and having to unwind everything.

Avoid bloated prompts, but don't mistake brevity for efficiency. A clear 200-word prompt that produces the right output is cheaper and faster than a punchy 30-word prompt that kicks off a six-message correction spiral. Better context means fewer wasted tokens. The math on this is not subtle.

4. Edit instead of follow-up

This one took me a while to internalise because adding a follow-up message feels like the natural thing to do when something isn't quite right. Claude said the button should be blue and it came out red. So you say "make the button blue." Simple enough.

Except that follow-up message now adds to the conversation history that Claude carries forward. Every correction you add to a thread increases the context Claude has to hold while doing anything else in that conversation. Longer threads mean more cognitive overhead, slower responses, and more tokens spent on history that may not even be relevant anymore.

The cleaner approach is to go back and edit the original prompt instead of layering corrections on top. If the output was wrong, the prompt was probably missing something. Fix the prompt, restart the exchange, and let Claude work from a clean starting point. The result is almost always better than what you'd get by correction-chaining your way toward it.

This matters even more in agentic workflows. In Claude Code, every back-and-forth exchange is not just conversation overhead, it's operational context being accumulated in an environment that's already burning through tokens faster than Chat. Cleaner threads mean better performance. It's one of those cases where the disciplined approach also happens to be the cheaper one.

Keep conversations efficient. Every correction costs context. Edit the source, not the output.

5. Match the model to the task

Stop using a Ferrari to go grocery shopping.

This is probably the most actionable change most people can make, and the one least talked about relative to how much difference it makes.

Claude is not one model. It's a family of models at different capability and cost levels, and using the most powerful one by default for everything is a waste in both directions. You're spending more than you need to on simple tasks, and you're potentially leaving the genuinely powerful models unavailable when you actually need them.

The rough breakdown works like this. Claude Haiku is fast and cheap and handles simple, well-defined tasks without breaking a sweat. Quick questions, basic formatting, lookup tasks, light summarisation, anything where the output is predictable and the instructions are clear. Use Haiku for these.

Claude Sonnet sits in the middle and handles the majority of real work. Lighter builds, moderately complex reasoning, iterative tasks that need a bit of nuance but aren't exactly research-grade. Most of what people do with Claude on a daily basis fits here.

Claude Opus and Cowork are for the deep end. Complex architecture decisions, multi-step agentic workflows, tasks that require holding a lot of context and reasoning across it simultaneously. These are genuinely more capable tools, and they cost more for that reason. Using them for tasks that Sonnet handles fine is just burning money on capability you didn't need.

Right model, right cost. Not every job needs the most expensive tool in the shed. Knowing which one to reach for is itself a skill.

The actual conclusion

Claude becomes far more powerful when workflow replaces randomness.
The token limit conversation is almost always a proxy for a workflow conversation that people don't want to have because it implicates their own habits. Hitting the limit feels like a product problem. It's usually a process problem.

Upgrading your plan gives you more of the same. Fixing your workflow gives you something that actually scales.

The difference, once you've felt it, is not subtle.