AI Basics

Anyone who uses AI rapidly runs into two scenarios:

Phenomenal responses in nearly zero time
Confidently delivered, hallucinated garbage

How can we build complex, useful things with AI if its outputs are this radically different?

To me, that looks like a continual loop between learning new things and playing with AI to see what it can and cannot do to push its limits. This builds an intuitive understanding of the edges of current AI technology, allowing us to use it and build useful things with confidence.

Andrej Karpathy’s Deep Dive into LLMs video is a great place to learn about the basic components. Well worth the 3+ hours. Every topic in here is pretty helpful to understanding core limitations of Large Language Models.

Another fun series is all of the 3Blue1Brown neural network & LLM content on YouTube.

LLM Limitations

LLMs are trained on messy, scraped internet data. They perform surprisingly well, given the source.
They have a combination of factual information and process information (cognitive core) from this training. https://youtu.be/lXUZvyajciY?si=ui8Vd6BSyZL34iVp&t=3031
They are distinctly missing tons of facts (never available in training), or only have a hazy recollection of facts that they factor into their answers with incorrect emphasis.

Large Language Models on their own are a building block with significant limitations.

LLM systems explains how these limitations can be mitigated with techniques like Context Engineering, Tools and Agents wrapped around the core models to build more complex, accurate LLM systems. Any consumer application like ChatGPT is a combination of all of these elements.

flowchart LR
    U(["User"]) --> A(["Agent"])
    A --> B(["Context Engineering"])
    B --> C(["LLM"])
    C --> R(["Response"])

    %% Tools interact bidirectionally with the LLM
    T(["Tools"])
    T <--> C

    %% Styles: no borders, transparent background, distinct fills for nodes
    style U fill:#ffccff,stroke:none,color:#000,font-weight:bold  %% light pink for User
    style A fill:#ff9999,stroke:none,color:#000,font-weight:bold
    style B fill:#ccffcc,stroke:none,color:#000,font-weight:bold
    style C fill:#ffcc00,stroke:none,color:#000,font-weight:bold
    style R fill:#b3e0ff,stroke:none,color:#000,font-weight:bold  %% light blue for Response
    style T fill:#99ccff,stroke:none,color:#000,font-weight:bold

    %% Transparent overall background
    %% (depends on renderer support)

Pace of change

Here’s a brief overview of the major LLM models and products build on top of them by the frontier model companies.

We’re seeing new models from the frontier model companies every few months. The gains are not always obvious in their chat bot productized versions, but are quite apparent in most coding agents (Claude Code, Codex, Cursor, etc)

One component of their improvements is in how they are being trained to use tools, enabling better performance and new use cases like computer use.

The agents and tools change at their own higher velocities and context engineering techniques improve continuously. One note there - as models get better at reasoning and tool use, their ability to dynamically acquire useful context for in context learning or deep research tasks gets better with minimal prompting. https://youtu.be/XuvKFsktX0Q?si=Qj2XGz1YgwjoeVPR&t=418.

As I have worked with agentic coding this year, my prompts for agents have been stripped of a lot of the original rules that kept them on the rails, particularly with Sonnet 4.5 and Claude Code 2, or Codex.

For non-coders, that means the chat bot implementations (e.g. ChatGPT) do a better job of using web search and reasoning to get to good answers with less guidance in the prompt.

For coders, it suggests that the custom agent dev hype is a bubble. The Codex & Claude Code harness/agents are rapidly becoming general purpose AI tools that can achieve a lot of agentic business needs with minimal guidance.