Token Efficiency
Where tokens go in an unguided session
Section titled “Where tokens go in an unguided session”Every AI coding session has a budget. The question is how much of that budget goes to the work you actually wanted — writing a PRD, producing a TechSpec, generating code — versus how much goes to the overhead of figuring out where you are and what to do next.
In an unguided session, the overhead is substantial. Without external state, the AI reconstructs context from the conversation history on every turn:
- What stage are we in? Re-read prior messages.
- What was decided in the PM phase? Re-summarize prior artifacts.
- Should I ask a follow-up or proceed? Reason about workflow position.
- Is that artifact saved somewhere, or do I need to regenerate it? Uncertainty about prior work.
For a medium-complexity feature going through PM and Engineering, this overhead typically runs 1,500–3,000 tokens across two stages — before any actual artifact is written. In long-running projects with many prior decisions, it is worse. The AI is spending a significant fraction of its context window re-reading history it should not have to touch.
How s2s eliminates orchestration overhead
Section titled “How s2s eliminates orchestration overhead”Every workflow decision that s2s makes runs in the CLI binary at zero LLM token cost:
| Operation | Token cost |
|---|---|
| Intent classification | 0 |
| Stage route planning | 0 |
| Context package construction | 0 |
.s2s/live.md state update | 0 |
| Artifact quality assessment | 0 |
| Ledger advancement | 0 |
| Gate creation and lifecycle | 0 |
| Worktree setup and isolation | 0 |
| Git delivery (branch, push, PR) | 0 |
None of this calls an LLM. The orchestrator runs locally, reads from structured files, and writes back to structured files. Your AI chat budget is untouched by every one of these operations.
What the AI gets instead
Section titled “What the AI gets instead”Rather than reconstructing context from conversation history, the AI receives a context package assembled by the conductor for this specific stage and this specific request. The package contains:
- The original user request and its classified intent
- Where this stage sits in the route
- Prior stage artifacts, summarized and structured — not the raw conversation
- The exact content requirements for this stage’s artifact
- The exact file path to write to
- The exact command to run when done
The AI reads the package, generates the artifact, and submits. It does not need to remember anything from prior turns because the package contains exactly what is relevant.
The numbers
Section titled “The numbers”For a medium-complexity feature going through pm → engineering:
Without s2s:
| Activity | Token estimate |
|---|---|
| Session orientation (re-reading history) | 500–1,000 |
| Reconstructing prior decisions per stage | 300–800 |
| Workflow reasoning per stage | 200–500 |
| Total overhead across two stages | 1,500–3,000 |
With s2s:
| Activity | Token estimate |
|---|---|
Reading live.md at session start | ~150 |
| Receiving context package per stage | ~200–400 (focused, no redundancy) |
| Total overhead across two stages | ~200–500 |
That is roughly a 6–10x reduction in token overhead for a two-stage feature. The AI’s entire remaining budget goes to the PRD and the TechSpec — the things you actually needed.
The benefit scales with project complexity. On a long-running project with ten prior stages of decisions, the AI would need to reconstruct an enormous amount of history without s2s. With s2s, the context package contains exactly the prior decisions relevant to this stage. The conversation history is irrelevant.
Why focused context beats full history
Section titled “Why focused context beats full history”There is a temptation to think that giving the AI more context is always better. In practice, it is not. Longer context windows mean more tokens spent on retrieval and reasoning, more noise mixed with signal, and higher cost per request.
s2s takes the opposite approach: give the AI exactly what it needs for this stage, nothing more. The context package for the Engineering stage contains the PRD summary and the research findings — not the full conversation thread that produced them. The AI gets signal without noise.
This is also why .s2s/live.md is the orientation primitive rather than s2s status. Reading live.md costs around 150 tokens. Running s2s status and reading its output costs around 400 tokens. Over many sessions and many stages, that difference compounds.
What this means for cost
Section titled “What this means for cost”If you are using a paid AI API in standalone mode, the efficiency gains translate directly to lower cost per feature. The orchestration overhead that s2s eliminates is pure waste — tokens that went to workflow reasoning rather than to the artifact your project needed.
For teams running AI-assisted development at scale, eliminating 1,500–3,000 tokens of overhead per feature, across dozens of features per month, is a meaningful reduction in API spend.
In chat-native mode (the default), you are working within your AI client’s existing subscription or session — the token savings mean you hit context limits later and session quality stays higher for longer.
Configuring quality checks
Section titled “Configuring quality checks”Quality checks run on --submit. They are the only place where s2s evaluates artifact content — and even this runs locally, not through an LLM call. The quality score determines whether the artifact advances automatically or requires a review gate.
{ "quality": { "enabled": true, "minAutoApproveScore": 0.85, "blockOnFailure": false }}minAutoApproveScore is a 0.0–1.0 threshold. Artifacts that score above it advance to the next stage automatically. Artifacts below it return a failure message listing what is missing, and the AI fixes and resubmits. The default is 0.85.
blockOnFailure controls whether a quality failure exits with a non-zero code. Useful in CI pipelines where you want quality failures to be hard failures. Defaults to false.
Update these settings with s2s config edit.