Better outcomes from LLMs

These are some of the main areas where I’ve seen outsized results that make me rethink how I work.

Planning vs. execution / Probabilistic vs. deterministic

LLMs are probabilistic. You’ll get slightly different results every time you run them with the same input. Traditional software is deterministic, most simply as pure functions.

This means LLMs are good at the creative part of work, but struggle with the highly repeatable stuff, much like humans. Ask us to do the same task 1000x and we’ll do it a bit differently each time for a bunch of reasons.

Our human coping mechanisms seem like good starting points for solving in LLM systems.
Checklist manifesto comes to mind.

Todo lists / plan mode are a key item in all agentic coding solutions now, but that seems to me like we’re just scratching the surface. If we think about much larger, complex projects we build as humans, there are multiple levels of planning delegated to specialists, with access to additional context on demand. In software land, that has drawn me to the ideas in collaborative specification.

Practically speaking, I think this means intentionally reframing more of our engagement with LLMs to something more like “explain how you would do X” or “build me a plan for X”, then iterate together on it as a creative process. Once defined, the LLM can probably just execute it. If not, it may need to write code to deterministically execute it predictably.

While the above may be where LLM systems are evolving as invisible internal functionality, I think it’s really helpful to be in the loop from the start. The planning is now the work. We shouldn’t skip it.

A bit of a wild side note about probabilistic vs. deterministic: It seems like deterministic software is almost always preferable in production software. Can agents use self observability to write deterministic software to replace their repetitive conclusions to reduce costs and increase predictable outcomes?

Transformations

LLMs do a fantastic job of transforming information from one language to another. I mean this in a broad sense, whether it’s translating spoken languages, programming languages or functional specifications.

At a certain level, agentic coding is just transformations. This relates to my area of interest around collaborative specification: spec ↔ code transformations.

Practically, this means LLMs are really good at refactoring code or rewriting it into different languages and frameworks. At least, conceptually. In practice, the latest models (e.g. Sonnet 4.5) and agents (e.g. Claude Code) can do it in large codebases without many errors. Fewer than humans would make in many cases.

This unlocks the option to refactor continuously, at a rate I would never have considered when humans did all the work. While situation dependent, I’d now err on the side of having agents continually looking for and prototyping refactoring opportunities, framework substitutions, package updates, etc. A cleaner codebase makes the LLMs and humans a bit more effective for all future changes.

This does require giving the LLM context on the priorities and direction of the overall business and product, which goes to the same topic of collaborative specification.

This transformation capability is related to how they’re good at digesting APIs and making new wrappers / matching API capabilities. Interesting read from Cloudflare on adding a coding layer on top of MCP: https://blog.cloudflare.com/code-mode/. I suspect this kind of thinking can be applied in many places. If agentic coded transformations are nearly free and highly accurate, where else would we add a layer in our applications?