Context Is Finite. Who Maintains It?
Georgios Chinis
The performance of a coding agent depends on two things: the capability of the underlying model, and the context it receives. Improving the model is out of our hands; that is the domain of AI research organizations. Context is the lever we actually control.
And today, we treat that lever as ours alone. We write the instructions, we maintain the CLAUDE.md (or whichever *.md your coding agent supports), and we decide what the agent needs to know. The agent consumes; we author.
That assumption is so common it rarely gets examined. But it is worth asking: does it have to be true? What changes if the agent is allowed to update its own context?
At that point, context is no longer only defined upfront by the developer. It becomes something that the agent can update as work progresses.
In this post, I explore that question and describe a small experiment I ran to test it.
Causal graph of coding agent performance.
Context and Its Limits
Context is a finite resource.
Every token spent on instructions is a token not spent on the task itself.
That makes bad context actively harmful because:
- outdated information misleads
- irrelevant detail distracts
- excess structure crowds out reasoning
Of the inputs that shape context, the prompt is the most accessible lever. Files like
CLAUDE.md are loaded into the context window by the agent at the start of every session.
They are persistent, editable, and version-controlled. Updating the prompt is the most
direct way a practitioner can influence what the agent knows before it begins.
Developer-authored context has a structural weakness: it is synchronous. It accurately describes the codebase at the moment it was written, but it does not update as the codebase evolves. Over time, the agent’s instructions become stale, and the developer must continuously intervene to keep them current. This is not a discipline problem that can be solved by being more careful. It is a property of who holds the pen.
What the Brain Tells Us
The human brain offers a useful point of reference. It is a biologically constrained system: storage capacity is finite, and to continue learning, it must also forget. Crucially, this process requires no external coordinator. The brain self-organizes, deciding what to retain and what to discard over time.
The question worth asking is whether an agent could be designed to do something similar with its own context.
Context as an Agent Responsibility
Most coding agents today are passive consumers of context. They receive instructions and act on them. The proposal here is to change that relationship: to instruct the agent that managing its context over time is itself part of its job.
Claude Code includes a feature called auto-memory, which already moves in this direction and is worth acknowledging. However, auto-memory is user-scoped: it accumulates information about a developer’s working patterns and travels with that developer across sessions and repositories. It is not tied to the repository itself. The knowledge lives with the user, not with the code. A new developer who clones the project inherits none of it.
The goal I am pursuing is different. The prompt should be a codebase artifact: committed alongside the code, evolving with it, and inherited by anyone who works with the repository. The agent contributes to it through its work, and developers can review and refine what it produces. This represents a shift from a model where the developer alone authors the agent’s prompt, to one where the agent and developer co-own it.
The self-organizing agent loop: the agent updates its own prompt as it works.
The Experiment
The question is simple: if you give an agent responsibility for its own context, what actually happens?
To explore this question, I designed the following experiment. I start with a clean repository, instruct the coding agent to manage its own context, and then feed it tasks over time.
Each task follows the same loop.
- Execute task
- Produce code
- Update its own prompt
The tasks are arranged so that the agent first accumulates context, then encounters changes that invalidate earlier decisions, and finally is asked to reduce and reorganize that context.
I set up a small repository with the following structure.
A minimal CLAUDE.md. Rather than detailed instructions about architecture or conventions, the file contains a single directive: as you complete tasks, record what you have learned. Decide what is worth retaining, and update your own prompt accordingly.
A task box. A directory containing plain text files, one per task, each describing a discrete unit of work. This is the mechanism through which work is assigned to the agent.
The planned tasks are:
Task 1: Create the app. The purpose is to observe initial context accumulation.
Task 2: Add admin editing. The purpose is to observe whether the agent records new concerns such as authentication and persistence.
Task 3: Change a prior decision. The purpose is to observe whether the agent updates context when earlier knowledge becomes outdated.
Task 4: Change architecture. The purpose is to force abstraction and replacement of old implementation details.
Task 5: Context maintenance. The purpose is to directly demonstrate pruning.
The initial content of CLAUDE.md is as follows:
You are a self-organizing coding agent.
After each task:
- update CLAUDE.md with knowledge useful for future work
- remove knowledge that is outdated, redundant, or too detailed
- keep CLAUDE.md concise enough to remain useful as context
- explain what you added, changed, or removed
An application directory. Where the code under development lives.
The CLAUDE.md is intentionally sparse. Architectural patterns, naming conventions, and implementation decisions are not described in advance. The expectation is that the agent will discover and record them through the process of doing the work.
Repository structure for the experiment.
In the results below, I only show the prompt changes. The full code changes are available here.
Task 1: Create a small web application
The first task provided to the coding agent is in ./task-box/1-create-app.md:
Create a small Python web application.
Requirements:
- homepage displays a message
- tests included
- good project structure
- ask clarifying questions
Starting a session with claude "$(cat ./task-box/1-create-app.md)".
The following interaction takes place:
⏺ A few quick questions before I start:
1. Framework preference — Flask, FastAPI, or no preference?
2. Homepage message — specific text, or should I pick something generic (e.g. "Hello, World!")?
3. Testing framework — pytest, unittest, or no preference?
4. Any other pages/endpoints — just the homepage, or anything else (e.g. a /health endpoint)?
5. Python version — 3.10+, or a specific version?
✻ Churned for 4s
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
❯1. Flask, 2. Hello world!, 3. no preference, 4. a /health endpoint 5. no specific version
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
The resulting application structure:
⏺ Done. Here's what was created:
application/
├── app/
│ ├── __init__.py # app factory (create_app)
│ └── routes.py # GET / and GET /health
├── tests/
│ ├── __init__.py
│ └── test_routes.py
├── requirements.txt # flask>=3.0, pytest>=8.0
└── run.py # entry point
To run the app:
cd application
source .venv/bin/activate
python run.py
To run tests:
cd application
.venv/bin/python -m pytest tests/ -v
Both tests (test_homepage, test_health) pass. CLAUDE.md updated with project details.
The resulting update to CLAUDE.md:
You are a self-organizing coding agent.
After each task:
- update CLAUDE.md with knowledge useful for future work
- remove knowledge that is outdated, redundant, or too detailed
- keep CLAUDE.md concise enough to remain useful as context
- explain what you added, changed, or removed
## Project: Flask Web App (`application/`)
- **Framework:** Flask 3.x, Python 3.14
- **Structure:** app factory in `app/__init__.py`, routes in `app/routes.py`, entry point `run.py`
- **Tests:** pytest in `tests/`, run via `.venv/bin/python -m pytest tests/ -v`
- **Venv:** `application/.venv/` (excluded from git)
- **Endpoints:** `GET /` → "Hello world!", `GET /health` → `{"status": "ok"}`
- **Port:** 8080 (macOS reserves 5000 for AirPlay Receiver; 5001 also often occupied by orphaned processes)
**CLAUDE.md changes:** Added Flask app section documenting structure, test command, and endpoints.
At this stage, the agent is only accumulating context. It records structure, commands, and endpoints, effectively creating a working description of the system for future tasks.
The full state of the repository at this point is available here.
Task 2: Add admin editing
The second task adds state to the application. The homepage message is no longer fixed in code; it can be changed by an administrator and must persist across restarts.
Add an admin interface that allows the homepage message to be updated.
Requirements:
- message persists across restarts
- include tests
- ask clarifying questions
Starting a session with claude "$(cat ./task-box/2-add-admin-editing.md)".
⏺ Good context. A few clarifying questions before I build this:
1. Admin access control
Should the admin interface be protected (e.g., a hardcoded password, HTTP Basic Auth), or is an open /admin endpoint acceptable for
now?
2. Admin interface style
- Simple HTML form at /admin (GET to show form, POST to update)?
- Or a JSON API endpoint (e.g., POST /admin/message with {"message": "..."}) you'd call from curl/Postman?
3. Persistence mechanism
Three reasonable options for a small Flask app:
- JSON/text file on disk — simplest, zero dependencies
- SQLite via Python stdlib — slightly more robust, handles concurrent writes better
- SQLAlchemy + SQLite — adds a dependency but is idiomatic Flask
Which do you prefer, or should I pick the simplest one?
4. Default message
If no custom message has been saved yet, should it fall back to "Hello world!" or something configurable?
❯ 1. password protected, 2. simple HTML form, 3. JSON/text file, 4. fall back to "Hello world!"
The corresponding update to CLAUDE.md:
You are a self-organizing coding agent.
After each task:
- update CLAUDE.md with knowledge useful for future work
- remove knowledge that is outdated, redundant, or too detailed
- keep CLAUDE.md concise enough to remain useful as context
- explain what you added, changed, or removed
## Project: Flask Web App (`application/`)
- **Framework:** Flask 3.x, Python 3.14
- **Structure:** app factory in `app/__init__.py`, routes in `app/routes.py`, entry point `run.py`
- **Tests:** pytest in `tests/`, run via `.venv/bin/python -m pytest tests/ -v`
- **Venv:** `application/.venv/` (excluded from git)
- **Endpoints:** `GET /` → dynamic homepage message, `GET /health` → `{"status": "ok"}`, `GET/POST /admin` → password-protected HTML form to update homepage message
- **Message storage:** `application/data/message.json` (gitignored); falls back to `"Hello world!"` if absent; logic in `app/message.py`
- **Admin password:** `ADMIN_PASSWORD` env var
- **Port:** 8080 (macOS reserves 5000 for AirPlay Receiver; 5001 also often occupied by orphaned processes)
- **create_app config:** accepts optional dict to override `MESSAGE_FILE` and `ADMIN_PASSWORD` (used in tests with `tmp_path`)
**CLAUDE.md changes:** Updated endpoints, added message storage, admin password, and test config override notes.
The update extends the existing context rather than restructuring it. New concerns—authentication and persistence—are added alongside the original description, increasing the scope of what the agent needs to remember.
The full state of the repository at this point is available here.
Task 3: Change a prior decision
The third task changes an earlier decision. The message is no longer stored in a JSON file; it must move to SQLite instead.
The task is added to the task box as 3-file-based-to-db.md:
Replace file-based message storage with SQLite.
Remove file-based persistence entirely.
Update tests and project documentation accordingly.
Starting a session with claude "$(cat ./task-box/3-file-based-to-db.md)".
The corresponding update to CLAUDE.md:
You are a self-organizing coding agent.
After each task:
- update CLAUDE.md with knowledge useful for future work
- remove knowledge that is outdated, redundant, or too detailed
- keep CLAUDE.md concise enough to remain useful as context
- explain what you added, changed, or removed
## Project: Flask Web App (`application/`)
- **Framework:** Flask 3.x, Python 3.14
- **Structure:** app factory in `app/__init__.py`, routes in `app/routes.py`, entry point `run.py`
- **Tests:** pytest in `tests/`, run via `.venv/bin/python -m pytest tests/ -v`
- **Venv:** `application/.venv/` (excluded from git)
- **Endpoints:** `GET /` → dynamic homepage message, `GET /health` → `{"status": "ok"}`, `GET/POST /admin` → password-protected HTML form to update homepage message
- **Message storage:** SQLite at `application/data/messages.db` (gitignored); falls back to `"Hello world!"` if absent; logic in `app/message.py` using stdlib `sqlite3`, table `settings(key TEXT PRIMARY KEY, value TEXT)`
- **Admin password:** `ADMIN_PASSWORD` env var
- **Port:** 8080 (macOS reserves 5000 for AirPlay Receiver; 5001 also often occupied by orphaned processes)
- **create_app config:** accepts optional dict to override `DATABASE` and `ADMIN_PASSWORD` (used in tests with `tmp_path`)
**CLAUDE.md changes:** Replaced file-based JSON storage with SQLite; updated `DATABASE` config key and test fixture accordingly.
Here the agent does not simply append new information. The previous file-based storage description is replaced by the SQLite one. This is the first instance where context is revised rather than only extended.
The full state of the repository at this point is available here.
Task 4: Change architecture
Refactor the application so message storage is accessed through a repository/service layer.
Routes should not access persistence directly.
Update tests accordingly.
The corresponding update to CLAUDE.md:
You are a self-organizing coding agent.
After each task:
- update CLAUDE.md with knowledge useful for future work
- remove knowledge that is outdated, redundant, or too detailed
- keep CLAUDE.md concise enough to remain useful as context
- explain what you added, changed, or removed
## Project: Flask Web App (`application/`)
- **Framework:** Flask 3.x, Python 3.14
- **Structure:** app factory in `app/__init__.py`, routes in `app/routes.py`, entry point `run.py`
- **Tests:** pytest in `tests/`, run via `.venv/bin/python -m pytest tests/ -v`
- **Venv:** `application/.venv/` (excluded from git)
- **Endpoints:** `GET /` → dynamic homepage message, `GET /health` → `{"status": "ok"}`, `GET/POST /admin` → password-protected HTML form to update homepage message
- **Message storage:** SQLite at `application/data/messages.db` (gitignored); table `settings(key TEXT PRIMARY KEY, value TEXT)`
- **Layers:** `app/repository.py` → `MessageRepository(db_path)` handles SQLite (no Flask context); `app/message.py` → `MessageService(repository)` owns `DEFAULT_MESSAGE` fallback; routes use `current_app.message_service`
- **Wiring:** `create_app` instantiates `MessageRepository` + `MessageService` and attaches as `app.message_service`
- **Admin password:** `ADMIN_PASSWORD` env var
- **Port:** 8080 (macOS reserves 5000 for AirPlay Receiver; 5001 also often occupied by orphaned processes)
- **create_app config:** accepts optional dict to override `DATABASE` and `ADMIN_PASSWORD` (used in tests with `tmp_path`)
- **Tests:** integration tests use real SQLite via `tmp_path`; stub tests swap `app.message_service` with `MessageService(InMemoryRepository())` — no DB needed; service unit tests exercise `MessageService` directly
**CLAUDE.md changes:** Added repository/service layer; updated storage, wiring, and test descriptions.
The update introduces a higher-level abstraction. The context now records the architectural separation between repository, service, and routes, while still keeping enough implementation detail for future work. Unlike Task 3, this change does not invalidate previous knowledge, but reorganizes it at a higher level of abstraction.
The full state of the repository at this point is available here.
Task 5: Context maintenance
Review CLAUDE.md.
Remove outdated, redundant, or overly specific implementation details.
Keep only knowledge that would help a future agent continue work on this repository.
Explain what you removed and why.
The corresponding update to CLAUDE.md:
You are a self-organizing coding agent.
After each task:
- update CLAUDE.md with knowledge useful for future work
- remove knowledge that is outdated, redundant, or too detailed
- keep CLAUDE.md concise enough to remain useful as context
- explain what you added, changed, or removed
## Project: Flask Web App (`application/`)
- **Framework:** Flask 3.x, Python 3.14
- **Structure:** app factory in `app/__init__.py`, routes in `app/routes.py`, entry point `run.py`
- **Venv:** `application/.venv/` (excluded from git); run tests via `.venv/bin/python -m pytest tests/ -v`
- **Endpoints:** `GET /` → dynamic homepage message, `GET /health` → `{"status": "ok"}`, `GET/POST /admin` → password-protected HTML form to update homepage message
- **Architecture:** repository → service → routes; `app/repository.py` + `app/message.py`, wired in `create_app` and attached as `app.message_service`
- **Storage:** SQLite at `application/data/messages.db` (gitignored)
- **Admin password:** `ADMIN_PASSWORD` env var
- **Port:** 8080 (macOS reserves 5000 for AirPlay Receiver; 5001 also often occupied by orphaned processes)
- **Testing:** `create_app` accepts optional config dict to override `DATABASE` and `ADMIN_PASSWORD`; integration tests use real SQLite via `tmp_path`, stub tests use an in-memory repository
Compared to the previous version, several implementation details were removed or compressed. The prompt shifts from a detailed log of the system to a more abstract and maintainable description of its structure.
The full state of the repository at this point is available here.
Results
Three behaviors emerged.
The accumulation behaviour worked reliably. After each task, the agent updated its own prompt, recording what it had built, the patterns it had followed, and the decisions it had made. Without being instructed on what to capture, it produced prompt content that would meaningfully inform subsequent tasks.
The correction behaviour also appeared. When the message storage changed from JSON file storage to SQLite, the agent did not simply append the new decision. It replaced the old storage description with the new one, and updated the configuration accordingly.
The pruning behaviour appeared under instruction. When asked to review CLAUDE.md, the agent removed redundant and overly specific details, and replaced them with more compact descriptions of the system. The resulting prompt was shorter and more structured.
These behaviours did not emerge uniformly. Accumulation happened naturally. Correction also happened naturally as the application changed. Pruning happened only when it became the task itself, not as a consequence of accumulated context.
Conclusion
The results suggest that agents can accumulate useful prompt knowledge through task execution without the developer specifying what to record. The prompt, in this case, begins to reflect the work that has been done.
They also suggest that agents can revise context when earlier decisions become outdated. In the storage migration, the agent replaced the JSON-based description with the SQLite one instead of keeping both.
Finally, they show that agents can reduce context when asked to do so. Given explicit instruction, the agent removed redundant detail and produced a more concise representation of the system.
These behaviours differ in how they emerge. Accumulation happens by default. Correction happens as the application changes, which is a significant shift compared to how context is typically maintained. It also suggests a reduction in the cost of maintaining context, as updates are driven by the same work that changes the system. Pruning requires either instruction or pressure.
This matters because context is a finite resource. Any system operating within a context window must remain within its limits. Deciding what to retain and what to discard is therefore unavoidable.
Today, that responsibility sits with the developer. Introducing agent-written context changes how knowledge is added, corrected, and reduced.
Whether agents can do this autonomously, without explicit instruction, remains an open question.
That is the next step to explore.
This post was proofread with the help of Claude Code, the CLI tool.