Save Time, Save Tokens: A Practical Guide to Prompting That Actually Works

When the output misses the mark, it's not always the model's fault. More often than not, the issue starts with the prompt.

Every message you send to an LLM costs tokens and more importantly, it costs you time. When a vague prompt returns something you didn't ask for, you pay twice: once for the output you can't use, and again for the follow-up correction. In a world where token usage is quickly becoming a genuine line item, learning to prompt well is a necessary skill.

Here's the thing most people don't consider: when the output misses the mark, it's not always the model's fault. More often than not, the issue starts with the prompt. The good news is that better prompting isn't complicated. It comes down to a single, uncomfortable truth that most people ignore.

The One Thing People Get Wrong

LLMs cannot read your mind.

That sounds obvious, but watch how most people prompt: they type a half-formed thought, give 0 context, hit enter, get something back that misses the mark, then spend three more messages steering the model toward what they actually wanted. Four messages to do the job of one. I've seen people get to the point where they start abusing their LLM.

The model isn't failing you and blaming the tool won't get you better results. You're giving it a job description that says "do the thing" and expecting it to know which thing. The fix is a simple framework that forces you to front-load the information a model actually needs.

The RCTO Framework

Think of every prompt as having four components: Role, Context, Task, and Output. Nail all four and you'll rarely need a follow-up.

Role: Tell the Model Who It Is

Before a model processes your request, it needs to know what lens to look through. A marketing copywriter and a technical writer will approach the same brief completely differently. The model is no different.

Weak prompt:

Write me something about our new product launch.

Strong prompt:

You are a senior B2B SaaS copywriter with 10 years of experience writing for enterprise audiences. You favour clarity over cleverness and always lead with the business outcome.

The role anchors tone, vocabulary, depth, and decision-making. Without it, the model defaults to a generic middle ground that serves nobody particularly well.

Context: Give It What It Can't Google

Context is where most prompts fail silently. People assume the model "knows" their situation. It doesn't. It knows language patterns. It has no idea who your audience is, what's already been tried, what constraints exist, or why this task matters right now.

Good context answers the unspoken questions:

Who is this for?
What do they already know?
What has already been done?
What are the constraints (word count, tone, format, platform)?
Why does this matter?

Example context block:

We are a mid-size fintech startup launching a new budgeting feature aimed at 25–35-year-old professionals in Australia. Our brand voice is approachable but never patronising. Competitors in this space include 'X' and 'Y'. This piece will be published on our company blog and shared via LinkedIn.

That single paragraph eliminates dozens of assumptions the model would otherwise have to make and probably get wrong.

Including attachments i.e. documentation; is also a simple and powerful way to provide context.

Task: Be Ruthlessly Specific

"Write a blog post" is not a task. It's a category. A task tells the model exactly what to produce and, just as importantly, what not to produce.

Vague task:

Write a blog post about our new feature.

Specific task:

Write an 800-word blog post that introduces our new Smart Budget feature. Open with a relatable pain point (tracking spending across multiple accounts), explain the feature's three core benefits, and close with a CTA directing readers to the waitlist. Do not compare us to competitors by name. Do not use the word "revolutionise."

The second version leaves almost no room for misinterpretation. The model can execute immediately rather than guessing at scope.

Output: Define the Finish Line

If you don't describe what "done" looks like, the model will decide for you. Sometimes that's fine. Often it isn't.

Specify:

Format: Markdown, bullet points, numbered list, table.
Length: Word count, paragraph count, or page count.
Structure: Should it have headers? A summary at the top? A conclusion?
Tone markers: Formal, conversational, academic, punchy.
What to avoid: Jargon, clichés, specific phrases, certain structural choices.

Example output instruction:

Return the article in Markdown with H2 subheadings. Keep the total length between 750 and 850 words. Use short paragraphs (3–4 sentences max). Do not use bullet points in the body; this should read as a narrative, not a listicle.

Restricting Sources: Control Where the Model Pulls From

This is an underused technique that dramatically improves the reliability of factual or research-based outputs. When you ask a model to answer a technical question, it draws on everything in its training data; which includes forums, opinion pieces, outdated blog posts, and outright misinformation sitting alongside peer-reviewed research.

You can fix this by including a source restriction block in your prompt. Here's a real-world example:

You are answering a question in the domain of: AI

STRICT SOURCE POLICY:
1. Only use these PRIMARY sources:
   - arXiv (arxiv.org)
   - Anthropic docs (docs.claude.com)
   - OpenAI docs (platform.openai.com)
2. Do not cite or rely on unlisted sources unless the user explicitly asks.
3. Avoid these sources for this topic: linkedin.com, x.com.
4. Cite sources for factual claims. If the trusted sources cannot
   answer, say so directly.

This does two things. First, it raises the quality floor of the output by biasing toward authoritative material. Second, it gives you an honesty mechanism; if the model can't answer from trusted sources, it tells you instead of confidently citing something unreliable.

You can adapt this pattern for any domain: legal research (restrict to legislation and case law databases), medical queries (restrict to PubMed and WHO), financial analysis (restrict to SEC filings and central bank publications). The principle is the same: define the information boundary before the model starts working.

The Memory Trap: Why Fresh Chats Matter

LLMs with memory functionality, where the model retains context from previous conversations; are genuinely useful for ongoing projects. But they introduce a subtle problem that most people don't notice until the damage is done.

Memory creates bias.

If you've spent the last ten conversations discussing Brand A's approach to content marketing, and you then ask the model to help you develop a strategy for an unrelated client, the model's suggestions will be quietly shaped by those earlier conversations. It's not malicious. It's pattern completion doing exactly what it's designed to do; drawing on available context. But that context is now contaminated with assumptions from a completely different project.

The practical risk is this: you get output that feels right but is subtly tilted toward frameworks, terminology, or strategic assumptions that belong to a different brief entirely.

The Fix Is Simple

Start a new chat for every new idea, project, or client.

Don't rely on the model to mentally "reset." It won't. Even when you tell it to ignore previous context, the memory system has already shaped what it considers relevant. A clean conversation is the only reliable way to get an unbiased starting point.

And whenever you open that new chat, lead with context. Don't assume the model carries over what you need, and don't assume it doesn't carry over what you don't need. Give it a fresh brief every time:

This is a new project. Disregard any context from previous conversations. Here is the brief for this task: [your RCTO block goes here].

Treat every conversation like a new hire's first day. They're smart, capable, and eager to help, but they know absolutely nothing about your specific situation until you tell them.

Putting It All Together: A Complete Prompt

Here's what a well-structured prompt looks like when all four RCTO components and a source restriction work together:

ROLE:
You are an experienced technical writer specialising in developer
documentation. You write for clarity above all else and assume
the reader is a competent developer who doesn't need hand-holding
but appreciates precise language.

CONTEXT:
We are producing a getting-started guide for our new REST API.
The audience is backend developers (Python and Node.js primarily)
who are evaluating our product against two competitors. This guide
will live on our docs site and is often the first thing a
prospective customer reads. It needs to feel fast and
confidence-building.

TASK:
Write a getting-started guide that walks the reader through:
1. Generating an API key
2. Making their first authenticated request
3. Parsing the response
Include code samples in both Python (using requests) and
Node.js (using fetch). Do not include installation instructions
for the languages themselves.

OUTPUT:
- Markdown format with H2 section headers
- Code blocks with language identifiers
- Total length: 600–800 words
- No introductory fluff — start directly with Step 1
- End with a "Next Steps" section containing 3 linked resources

SOURCE POLICY:
Reference only our official API docs at api.example.com/docs.
Do not reference third-party tutorials or Stack Overflow answers.
If you are unsure about an endpoint's behaviour, flag it rather
than guessing.

That prompt will produce a usable first draft in a single exchange. No follow-ups, no corrections, no wasted tokens or time.

The Bottom Line

Prompting well is not about tricks or secret techniques. It's about recognising that when output misses the mark, the prompt is usually the place to look first, not the model. A language model is a sophisticated tool that requires clear instructions, not a mind reader that will figure out what you meant.

Define the role. Provide the context. Specify the task. Describe the output. Restrict the sources when accuracy matters. Start fresh when the topic changes. And front-load your thinking so the model doesn't have to guess.

Every token you save on corrections is a token you can spend on actual work; and every minute you don't spend re-prompting is a minute back in your day.