Better outcomes from LLMs
These are some of the main areas where I’ve seen outsized results that make me rethink how I work.
Planning vs. execution / Probabilistic vs. deterministic
LLMs are probabilistic. You’ll get slightly different results every time you run them with the same input. Traditional software is deterministic, most simply as pure functions.
This means LLMs are good at the creative part of work, but suck at the highly repeatable stuff, much like humans. Ask us to do the same task 1000x and we’ll do it a bit differently each time for a bunch of reasons.
Our human coping mechanisms seem like good starting points for solving in LLM land.
Checklist manifesto anyone?
Todo lists / plan mode are a key item in all agentic coding solutions now, but that seems to me like we’re just scratching the surface. If we think about much larger, complex projects we build as humans, there are multiple levels of planning delegated to specialists, with access to additional context on demand. In software land, that has drawn me to the ideas in collaborative specification.
Practically speaking, I think this means intentionally reframing more of our engagement with LLMs to something more like “explain how you would do X” or “build me a plan for X”, then iterate together on it as a creative process. Once defined, the LLM can probably just execute it. If not, it may need to write code to deterministically execute it predictably.
While the above may be where LLM systems are evolving as invisible internal functionality, I think it’s really helpful to be in the loop from the start. The planning is now the work. We shouldn’t skip it.
A bit of a wild side note about probabilistic vs. deterministic: It seems like deterministic software is almost always preferable in production software. Can agents use self observability to write deterministic software to replace their repetitive conclusions and context collection to reduce costs and increase predictable outcomes?
Transformations
LLMs do a fantastic job of transforming information from one language to another. I mean this in a broad sense, whether it’s translating spoken languages, programming languages or functional specifications.
At a certain level, agentic coding is just transformations. This relates to my area of interest around collaborative specification. spec<->code transformations.
Practically, this means LLMs are really good at refactoring code or rewriting it into different languages and frameworks. At least, conceptually. In practice, the latest models (e.g. Sonnet 4.5) and agents (CC2) can do it in large codebases without many errors. Fewer than humans would make in many cases.
This unlocks the option to refactor continuously, at a rate I would never have considered when humans did all the work. While situation dependent, I’d now err on the side of having agents continually looking for and prototyping refactoring opportunities, framework substitutions, package updates, etc. A cleaner codebase makes the LLMs and humans a bit more effective for all future changes.
This does require giving the LLM context on the priorities and direction of the overall business and product, which goes to the same topic of collaborative specification.
This transformation capability is related to how they’re good at digesting APIs and making new wrappers / matching API capabilities. Interesting read from Cloudflare on adding a coding layer on top of MCP: https://blog.cloudflare.com/code-mode/. I suspect this kind of thinking can be applied in many places. If agentic coded transformations are nearly free and highly accurate, where else would we add a layer in our applications?
Browser use
In dev, this looks like Playwright MCP, but now with Atlas, it may become mainstream. The latest LLMs do a great job of looking at a page and understanding relationships between elements, likely data types and user flows.
Things to do with the capability:
- Test your application. (duh)
- Point the browser at existing web applications to explore and extract a likely database schema, user flow, application logic, etc.
- Essentially, clone the spec into an LLM friendly format
- Then explore the spec manually or as questions of the LLM
- Good for in depth competitor analysis
- You could ask the LLM questions about a competitor directly, but this forces it to build an internally coherent scaffold and provide that for follow up analysis. Somewhat similar to todo list tools in agents today.