Beyond RPI: Make Your AI Verify Its Assumptions

The current standard for agentic coding is RPI: Research, Plan, Implement. The agent reads your codebase to generate research, generates a plan based on that research, then writes code based on the plan.

Shoutout to @dexhorthy for popularizing this approach!

The problem with this approach is that verifying each phase requires a human in the loop. You need to read the research summary, check the plan makes sense, review the code. It's extremely tedious and it really doesn't scale, especially with multi-agent setups.

Here's the key improvement I'm proposing: every phase outputs executable code that proves the agent's understanding.

The "Problem" with RPI

I use the word "problem" very lightly here. I've gotten a lot of mileage from RPI!

During research, the agent outputs a markdown doc summarizing what it learned. You read it, it looks reasonable, you move on. "Looks reasonable" isn't verification. Really, the prose isn't even precise enough to be completely verifiable. The AI's using shorthand, you're both making assumptions.

During planning, same thing. Another markdown doc. Another skim. References real files, so it's probably fine, right?

During implementation, it writes code. When it breaks, you can usually figure out what went wrong. But the agent generates code in seconds while you're still reading the plan from ten minutes ago. You might even have another agent waiting on you to verify its research. This is not scalable.

The deeper issue: AI is great at gathering context by reading files, calling tools, building up a research doc, etc, but it's bad at knowing when it has enough context. It doesn't know what it doesn't know. So it confidently produces a research summary that's missing the one file that would've changed everything.

The Fix: Executable Verification

Each phase should output runnable code that proves assumptions. The key insight for this is verification depth should decrease as you progress.

Research needs deep, precise verification because you're establishing ground truth about the codebase. Planning verification is lighter, you're just checking that preconditions hold. For implementation, verification is just standard unit and integration testing.

This directly mirrors the points in @dexhorthy's well-regarded "Advanced Context Engineering" about the magnitude of the impact of bad outputs at each phase of the workflow.

Research Phase: Deep Verification

At this step, verification should be deep. The agent writes scripts that prove specific claims about the codebase, but not just random claims. Each assumption should be directly relevant to the user's goal and written with the planning phase in mind. You're building a strong foundation on which the plan can be made, like building your house on a solid rock.

// research/assumptions.ts

import { Project, SyntaxKind } from "ts-morph";

const project = new Project({ tsConfigFilePath: "./tsconfig.json" });

interface Assumption {
  claim: string;
  verify: () => Promise<boolean>;
}

export const assumptions: Assumption[] = [
  {
    claim: "Authentication uses JWT with RS256 signing",
    verify: async () => {
      const authFiles = project.getSourceFiles("**/auth/**/*.ts");

      for (const file of authFiles) {
        const calls = file.getDescendantsOfKind(SyntaxKind.CallExpression);
        for (const call of calls) {
          const text = call.getText();
          if (text.includes("sign") || text.includes("verify")) {
            const args = call.getArguments();
            const optionsArg = args.find((a) => a.getText().includes("algorithm"));
            if (optionsArg?.getText().includes("RS256")) {
              return true;
            }
          }
        }
      }
      return false;
    },
  },

  {
    claim: "UserService.authenticate returns a Result type, not exceptions",
    verify: async () => {
      const userService = project.getSourceFile("**/UserService.ts");
      if (!userService) return false;

      const authMethod = userService.getClasses()[0]?.getMethod("authenticate");

      if (!authMethod) return false;

      const returnType = authMethod.getReturnType().getText();
      return returnType.includes("|") && (returnType.includes("ok: true") || returnType.includes("success"));
    },
  },

  {
    claim: "All database queries go through Repository layer",
    verify: async () => {
      const allFiles = project.getSourceFiles("src/**/*.ts");
      const repoFiles = project.getSourceFiles("**/repositories/**/*.ts");
      const repoFilePaths = new Set(repoFiles.map((f) => f.getFilePath()));

      for (const file of allFiles) {
        if (repoFilePaths.has(file.getFilePath())) continue;
        if (file.getFilePath().includes(".test.")) continue;

        const imports = file.getImportDeclarations();
        for (const imp of imports) {
          const mod = imp.getModuleSpecifierValue();
          if (mod.includes("prisma") || mod.includes("pg")) {
            return false; // Direct DB import outside repository
          }
        }
      }
      return true;
    },
  },

  {
    claim: "Services use dependency injection via constructor",
    verify: async () => {
      const serviceFiles = project.getSourceFiles("**/services/**/*.ts");

      for (const file of serviceFiles) {
        for (const cls of file.getClasses()) {
          const constructor = cls.getConstructors()[0];
          if (!constructor) continue;

          const params = constructor.getParameters();
          const hasInjection = params.some((p) => {
            const type = p.getType().getText();
            return type.includes("Repository") || type.includes("Service") || type.includes("Client");
          });

          if (cls.getMethods().length > 0 && !hasInjection) {
            return false;
          }
        }
      }
      return true;
    },
  },
];

async function main() {
  for (const a of assumptions) {
    const result = await a.verify();
    console.log(`${result ? "✓" : "✗"} ${a.claim}`);
  }
}

main();

As you can see from this example, this isn't just surface-level. I'm talking about parsing the AST, tracing patterns across files, validating architectural invariants. When these pass, you know the agent understands your codebase.

Best of all, this is reproducible context. You can start a new conversation and have the agent run this to immediately bring it up-to-speed. You can run it after merging in changes from your upstream branch to make sure nothing has changed. It's evergreen research.

Planning Phase: Precondition Checks

Planning verification is lighter. You're not re-proving everything, just checking that preconditions for each step actually hold.

Most agentic tools already have a "todo list" feature, for example Cursor and Claude Code both do this. The difference here is that each todo item comes with executable preconditions. Before the agent starts a step, it runs the checks. If they fail, it stops and reports instead of plowing ahead.

// planning/preconditions.ts

import * as fs from "fs";

interface PlanStep {
  name: string;
  preconditions: Array<{
    description: string;
    check: () => boolean;
  }>;
}

export const plan: PlanStep[] = [
  {
    name: "Add OAuth2 provider support",
    preconditions: [
      {
        description: "AuthService exports authenticate method",
        check: () => {
          const content = fs.readFileSync("src/services/AuthService.ts", "utf-8");
          return content.includes("export") && content.includes("authenticate");
        },
      },
      {
        description: "No existing OAuth implementation",
        check: () => {
          return (
            !fs.existsSync("src/services/OAuthService.ts") &&
            !fs.readFileSync("src/services/AuthService.ts", "utf-8").toLowerCase().includes("oauth")
          );
        },
      },
    ],
  },
  {
    name: "Add OAuth callback route",
    preconditions: [
      {
        description: "Auth routes file exists",
        check: () => fs.existsSync("src/routes/auth.ts"),
      },
      {
        description: "Router pattern is express-style",
        check: () => {
          const router = fs.readFileSync("src/routes/auth.ts", "utf-8");
          return router.includes("router.") || router.includes("Router()");
        },
      },
    ],
  },
  {
    name: "Store OAuth tokens in user record",
    preconditions: [
      {
        description: "User repository has update method",
        check: () => {
          const repo = fs.readFileSync("src/repositories/UserRepository.ts", "utf-8");
          return repo.includes("update(") || repo.includes("save(");
        },
      },
    ],
  },
];

for (const step of plan) {
  const passed = step.preconditions.every((p) => p.check());
  console.log(`[${passed ? "READY" : "BLOCKED"}] ${step.name}`);

  if (!passed) {
    step.preconditions.filter((p) => !p.check()).forEach((p) => console.log(`    ✗ ${p.description}`));
  }
}

These checks should be more basic: file existence, string matching, basic structure. Enough to confirm assumptions without re-analyzing everything.

Implementation Phase: Standard Tests

By implementation, verification is just normal software testing. The deep work is done.

We all write tests for our code, right? ...right?!

// oauth.test.ts

import { describe, it, expect } from "vitest";
import request from "supertest";
import { app } from "../src/app";

describe("OAuth callback", () => {
  it("exchanges code for token and redirects", async () => {
    const res = await request(app).get("/auth/oauth/google/callback").query({ code: "test-code" });

    expect(res.status).toBe(302);
    expect(res.headers.location).toBe("/dashboard");
  });

  it("rejects invalid provider", async () => {
    const res = await request(app).get("/auth/oauth/fakeprovider/callback").query({ code: "test-code" });

    expect(res.status).toBe(400);
  });

  it("requires code parameter", async () => {
    const res = await request(app).get("/auth/oauth/google/callback");
    expect(res.status).toBe(400);
  });
});

We're not doing anything fancy here. If research and planning verified correctly, implementation tests just confirm the code works.

There's definitely an interesting thread to be explored here related to the Curry-Howard correspondance..

The Gradient Matters

Why decrease depth as you progress?

Research is expensive to get wrong. A false assumption propagates through everything. Worth the cost of deep AST analysis.

Planning builds on verified research. You've proven how the codebase works. Now you're just checking integration points exist.

Implementation builds on verified plans. Structure is validated. Write code, run tests.

Front-load verification when it matters most, then coast.

How this improves on RPI

Failures are localized. You know which assumption broke and where.
Agents can self-correct. "Assumption failed: queries bypass repository layer" is actionable.
Fewer human touchpoints. RPI needs a human to review research, approve the plan, then check implementation. With executable verification, the machine validates research and planning automatically. You only need to show up at the end. This is how you get to fully autonomous agents.
Auditable. Every decision has a trail.

Shoutout to @GeoffreyHuntley for his "Ralph Wiggum" idea. Executable verification is the missing component that could really escalate this approach.

The Tradeoff

Of course, everything has a tradeoff. Here you're using more tokens. You also need to teach the agent to run verification at each phase.

I'll be releasing tooling to make this easier soon, check back later!

But for complex codebases where mistakes are expensive, this beats a markdown-based plan and a prayer to the RNG gods.

Andrue's Lab