Build Log

From Proof of Concept to Self-Improving Kernel

Lane Thompson · February 2026

15phases
18,500+lines of IR
50modules
211KBkernel

Goal

Prove or disprove the core thesis: can AI translate human intent directly into LLVM IR, execute it, and verify the result against the original intent through a closed semantic loop?

This is Generation Zero — built entirely with conventional tools. The question: “is this even remotely workable?”

The Pipeline

  1. Intent Layer — Claude refines natural language into a structured specification (function signature, constraints, test cases)
  2. Meaning Compiler — Claude generates LLVM IR from the spec, validated by llvmlite with a retry loop (up to 5 attempts)
  3. Executor — llvmlite JIT-compiles the IR and runs it against test cases via ctypes FFI
  4. Semantic Bridge — Claude reads the IR and test results, produces a behavioral model of what the system actually does
  5. Alignment Engine — Compares intent against behavior using deterministic and semantic tracks

Phase 1: Scalar Functions

The first tests targeted pure scalar functions — integers and floats in, scalar out. No strings, no memory, no I/O.

TestFunctionIR AttemptsTestsResultTime
1factorial (handwritten IR)N/A7/7PASS<1s
2unit tests (3 functions)N/A5/5PASS<1s
3factorial (full pipeline)114/14PASS30.0s
4is_prime (full pipeline)125/25PASS27.7s
5fibonacci (full pipeline)119/19PASS31.2s
6integer_square_root216/16PASS39.9s

Key Findings

Phase 2: String I/O

Extended to functions that read strings (ptr parameters) and write string output via a caller-allocated buffer pattern.

TestFunctionTypeTestsResult
7-8string_length, count_char (hardcoded)string→scalar9/9PASS
9string_length (full pipeline)string→scalar14/14PASS
10count_char (full pipeline)string→scalar15/15PASS
11-12reverse_string, to_uppercase (hardcoded)string→string10/10PASS
13reverse_string (full pipeline)string→string14/14PASS
14to_uppercase (full pipeline)string→string14/14PASS
15to_lowercase (full pipeline)string→string14/14PASS

Key Findings

Phase 3: Behavioral Repair Loop

The breakthrough feature: when test cases fail, the Semantic Bridge diagnoses the issue and the Meaning Compiler repairs the IR automatically.

TestFunctionRepair IterationsTestsResult
16trim_whitespace2 (spec bug)11/14PARTIAL
17trim_whitespace (v2)114/14PASS
18capitalize_words014/14PASS
19caesar_cipher014/14PASS
20count_words014/14PASS

Repair Loop Architecture

trim_whitespace revealed that the Intent Layer miscounted output string lengths, producing impossible expected values. The repair loop correctly fixed the IR three times but couldn’t overcome bad test expectations. Re-running with a better prompt succeeded immediately.

Phase 4: Pattern Distillation + Arrays

Pattern distillation captures what works and improves future generation. Array support extended the domain to pointer-to-array functions.

TestFunctionTypeTestsResult
21-24Distillation testsmixed56/56PASS
25-30bubble_sort, find_second_largest, etc.array60+PASS

Phase 5: Program Generation

The final proof of Generation Zero: extending from pure functions to standalone programs. A single prompt produces a Linux binary that serves HTTP — compiled from AI-generated LLVM IR with zero dependencies.

Architecture

TestDescriptionTestsResultTime
31HTTP hello world server2/2PASS44.1s
32Multi-route server (index + build-log)2/3PASS*~50s
33Full content server (whitepaper + build-log + index)This server

*Test 32 “failure” was a spec-level mismatch: the Intent Layer expected “Index” in the response body, but the actual page content says “Semcom”. All routes verified working via manual curl.

What This Server Proves

Generation Zero was built entirely with conventional tools. The question was whether this approach is workable. The answer is yes. What comes next is using the system to help build the next version of itself.

Phase 6: The Self-Improving Kernel

The leap from Python proof-of-concept to a native LLVM IR binary that improves itself. Generation Zero proved the thesis — now the system builds its own successor.

Sub-phases

PhaseCapabilityDescription
6ARaw syscallsread, write, open, close, mmap, exit — all in LLVM IR
6BTLS clientBearSSL integration for HTTPS to api.anthropic.com
6CDNS resolverResolve api.anthropic.com from LLVM IR
6DHTTP clientFull HTTP/1.1 in LLVM IR — buf_append builder pattern, error propagation via -1
6EJSON parserjson_find_pattern, json_read_string, json_get_content_text
6FJSON emitterConstruct Claude API request payloads in LLVM IR
6GClaude API integrationPrompt construction with manifest + source context enrichment
6HSelf-improvement loopRead task, generate IR, validate with llc, repair up to 5x, write back to source tree, rebuild

Result

209KB native binary. The kernel modified its own source code (json.ll) via @target directive, validated the generated IR with llc, wrote it back to the source tree, and rebuilt all binaries. First confirmed self-modification.

Phase 7: Directives and Capabilities

The kernel grew new capabilities through a directive system — each directive is a new mode of operation triggered by a prefix in the task file.

DirectivePurpose
@target filename.llModify an existing module
@target filename.ll appendAdd new functions to a module (Claude outputs only new code, kernel merges with existing source)
@create filename.llGenerate a new module from scratch
@planEnglish intent → Claude plans multi-step build → kernel executes each step
@testSemantic verification: generate test harness via 2nd Claude call, compile, link, execute, behavioral repair up to 3x
@transformScan external codebase, plan transformation, execute multi-step port to LLVM IR
@helpSelf-implemented — the kernel added this directive to itself using @target

Multi-step orchestration: Tasks delimited by ---, executed sequentially with manifest regeneration between steps. The kernel processes each step through the full pipeline (generate, validate, repair, write-back) before advancing to the next.

Phase 8: Self-Hosting

The kernel eliminated its own Python dependencies — the tools it relied on for build infrastructure were rewritten in LLVM IR by the kernel itself.

Tool PortedOriginalLLVM IR VersionLines
Byte-count correctorfix_constants.pyfix_constants_ir.ll382
Manifest generatorgen_manifest.pygen_manifest_ir.ll703

Zero Python dependencies remain. Both tools use sys_mmap for dynamic allocation. Critical detail: must use Linux mmap flag values even on macOS due to the syscall translation layer.

Phase 9: Library Verification

Full behavioral verification of the generated library using the @test directive. The kernel generates test harnesses via a second Claude API call, compiles and links them, executes, and checks exit codes.

CategoryFunctions TestedResult
Mathfactorial, fibonacciPASS
Predicatesis_primePASS
String operationsstring_length, reverse, uppercase, lowercase, trim, capitalize, caesar_cipher, count_words, count_charPASS
Array operationsbubble_sort, find_second_largestPASS
Searchingbinary_searchPASS
Hashingdjb2_hashPASS
Numericinteger_square_root, gcdPASS
Encodingbase64_encode, base64_decodePASS

21 functions tested, 21 passed.

Phase 10: Intent Compilation

Programs built from pure English intent — a single sentence becomes a working binary.

ProgramIntentBinary SizeResult
TCP echo server“Build a TCP echo server on port 8080”50KBAll tests pass
HTTP content server“Build an HTTP server with routing for /, /whitepaper, and /build-log”48KBAll routes verified
Base64 library“Implement base64 encoding and decoding”(module)Verified via @test

The HTTP content server is the binary serving this page.

@transform Stress Tests

A 5-file Python project (sorting, searching, bitwise, numeric, hashing) was ported to LLVM IR using the @transform directive: 43 functions, 1,134 lines of LLVM IR. 4 of 5 modules compiled on the first try. The fifth needed one repair iteration.

A 5-file Python Forth interpreter was ported via @transform: 6 LLVM IR modules, 57KB binary. Supports arithmetic, stack operations, comparisons, DO..LOOP with I/J, IF/ELSE/THEN, word definitions, EMIT, CR, and user-defined words. Factorial, fibonacci, fizzbuzz all pass. Required 12 manual cross-module signature fixes — the primary motivation for the manifest signature enhancement in Phase 12.

Phase 11: Deterministic Fixup Passes and CI/CD

Two new deterministic post-processing passes were added to the kernel pipeline, catching and fixing patterns that Claude generates incorrectly. These run after code generation but before llc validation — pure string transforms, no AI involved.

PassWhat It FixesLines of IR
fix_constantsCorrects [N x i8] byte-count mismatches in string constants382
fix_declaresReplaces incorrect declare signatures with correct ones from the manifest338
fix_syscallsRewrites raw @syscall(i64 N, ...) to platform-agnostic sys_* wrappers380

The fix_syscalls pass addresses an abstraction layer violation: Claude sometimes generates raw Linux syscall numbers instead of the project’s cross-platform wrapper functions. On macOS, Linux syscall numbers map to completely different operations — syscall 9 (mmap on Linux) is something else entirely on Darwin. The pass detects which syscall numbers are used, replaces the variadic @syscall declare with specific sys_* declares, and rewrites each call site.

Brainfuck @transform

To validate the new passes, a Python brainfuck interpreter was ported to LLVM IR via @transform. Three progressive runs were needed:

  1. Run 1: 3/4 modules compiled. interpreter.ll failed (raw @syscall usage, unbalanced braces) — exposed the syscall abstraction bug
  2. Run 2: After adding fix_syscalls pass. 4/4 compiled, but tape.ll still used raw syscalls (the pass missed it because the call pattern differed)
  3. Run 3: After fixing tape.ll manually. All 4 modules compile and link. 36KB binary passes all tests: nested loops, Hello World, digit sequences, arithmetic

CI/CD Pipeline

Deployment is now fully automated via GitHub Actions. Pushing to main triggers:

  1. Install LLVM 17 on Ubuntu runner
  2. Generate content.ll from HTML pages
  3. Cross-compile all LLVM IR to x86_64 Linux objects
  4. Link static binary with ld.lld
  5. Push to ECR via crane
  6. Trigger AWS App Runner deployment
  7. Verify all endpoints respond

First run: all green in 3 minutes 9 seconds.

Phase 12: Manifest Signatures and Forth Interpreter

The Forth interpreter’s 12 cross-module signature mismatches revealed the biggest remaining gap: when Claude generates module B that calls functions in module A, it guesses the signatures. Common errors: void vs i64 returns, i64 vs i1 boolean confusion, inventing extra parameters.

The Fix: Manifest Signatures

The self-hosted manifest generator (gen_manifest_ir.ll) was enhanced to extract full function signatures from define lines. Previously the manifest listed:

ds_push|forth_stack.ll|public|15

Now it includes the complete signature:

ds_push|forth_stack.ll|public|15|void @ds_push(i64 %val)

Since the kernel dumps the entire manifest into every Claude prompt, the model now sees correct types for every function when generating new modules. Combined with the fix_declares pass (which corrects any remaining mismatches against the manifest), this eliminates the cross-module signature hallucination class entirely.

Verification

A 3-file Python stack calculator project was transformed via @transform to validate the fix:

MetricForth (before fix)Calculator (after fix)
Cross-module signatures12 mismatches0 mismatches
Manual fixes needed14 total0
Repair iterationsmultiple0
Binary worksafter manual fixesimmediately

15 out of 15 cross-module function calls had perfect signature matches. Three modules (stack, math_ops, calculator), all compiled and linked on the first attempt, producing a 35KB binary that passes all tests.

Buffer Overflow Fix

The manifest generator had a latent bug: its defined_funcs array was sized for 256 entries, but the codebase had grown to 282+ functions. This caused stack corruption and a SIGSEGV crash. Fixed by increasing all three internal arrays from 256 to 512 entries.

Phase 13: Manifest On-Demand

The manifest (/tmp/kernel_manifest.txt) is the system’s ABI discovery layer — every module discovers how to call every other module through it. By Phase 12 it had grown to 39KB (~391 functions). But it was loaded wholesale: declare_fix.ll read the entire file into a 65KB buffer and linearly scanned it for each declare. Worse, kernel.ll loaded it into an 8KB buffer for Claude prompts — silently truncating 80% of the manifest. Claude could only see the first ~60 functions out of 391.

The fix: turn the manifest from a document into a queryable service with three new LLVM IR functions in manifest.ll.

Three New Functions

FunctionPurposeLines of IR
manifest_lookup_signatureOn-demand exact lookup: given a function name, returns its full signature from the manifest~120
manifest_collect_relevantScans source + task text for @name references, deduplicates, looks up each in the manifest, writes only matching entries to the output buffer~380
manifest_find_similarFuzzy “did you mean?” lookup: scores every function by prefix overlap and suffix match, returns the best match above a threshold~300

Two private helpers support these: _is_ident_char (character classification) and _cr_add_name (deduplicating name table insertion).

How It Changed the Pipeline

declare_fix.ll (400 → 284 lines): Removed the 65KB manifest buffer, bulk loading, and linear scanning. Each declare line now triggers a single manifest_lookup_signature call. If exact match fails, falls back to manifest_find_similar for fuzzy correction. The manifest is opened, searched, and freed per lookup — simple and correct.

kernel.ll (3 call sites changed): Replaced manifest_load(ptr, 8192) with manifest_collect_relevant(source_ptr, src_len, task_ptr, task_len, out, 8192) at the transform planning, @plan, and compile sites. Instead of dumping the first 8KB of a 39KB file, the kernel now extracts only the functions referenced in the source code and task description. A typical @create task generates ~2KB of focused manifest context — well within the 8KB budget.

Verification

A @create string_utils.ll task was run end-to-end through the updated pipeline:

manifest: ok  (manifest_collect_relevant found @sys_write, @copy_bytes from task text)
api: ok       (2977 bytes generated, first attempt)
validate: ok  (no repair needed)
write-back: ok

The kernel correctly identified the two functions mentioned in the task description, looked up their signatures, and passed focused context to Claude. The generated module compiled on the first attempt with correct cross-module declares.

Impact

MetricBeforeAfter
Manifest context for ClaudeFirst 8KB (truncated, ~60 functions)Only referenced functions (~2KB, complete)
declare_fix.ll manifest access65KB bulk load + linear scanPer-declare on-demand lookup
Fuzzy name correctionNonePrefix + suffix scoring, auto-correct above threshold
manifest.ll474 lines1,421 lines
declare_fix.ll400 lines284 lines
Kernel binary209KB210KB

Phase 14: doesNotUnderstand + Retry Resilience

The kernel had a blind spot: when Claude generated code calling a function that doesn’t exist in the codebase, the system had no way to detect or recover. The declare_fix.ll pass would silently pass it through, llc would accept the unresolved declare, and the linker would fail — with no diagnostic and no recovery path.

Phase 14 adds doesNotUnderstand — a detection and auto-recovery system inspired by Smalltalk’s message-not-understood protocol.

Missing Function Detection (v1)

declare_fix.ll now records functions that fail both exact and fuzzy manifest lookup. After checking manifest_is_shim to exclude known C wrappers, unresolved names are written to /tmp/kernel_missing.txt in the format name|declare line\n. The kernel reads this file after the fixup pipeline and prints a diagnostic.

Auto-Generation (v2)

Detection alone isn’t enough. When missing functions are found, the kernel now builds a @create auto_missing.ll task containing the missing function declarations, saves it to the retry buffer alongside the original step’s task, and skips the current step. On the retry pass, the @create executes first — generating the missing module with correct implementations — then the original step retries with an updated manifest that includes the new functions.

Retry Improvements

FeatureBeforeAfter
Error context in retryOnly task text preservedTask text + compiler stderr from failed attempt
Retry attempts1 passUp to 2 passes
Repair prompt contextManifest + failed IR + errors+ original task description
Fuzzy match hintsSilent replacement; NOTE: fuzzy match replaced: <original> comment

Buffer Scaling

Manifest read buffers scaled from 64KB to 128KB, name tables from 32KB to 64KB (256 entries). Prevents silent truncation as the codebase grows — the manifest reached 39KB in Phase 13, leaving only 25KB of headroom in the old buffers.

Bug Found and Fixed

Testing exposed an infinite loop in manifest_is_shim: when manifest_next_line returned -1 at a section boundary, the function didn’t check for the negative offset. A GEP with offset -1 read memory before the buffer, and the loop spun forever at 100% CPU. Any function name not in the manifest’s @shims section triggered it — including common names like malloc, realloc, free. One-line fix: check is_next_off < 0 before looping back.

Verification

@transform transform_targets/rpn_calc
step 1/5: @create stack.ll        validate: ok
step 2/5: @create rpn.ll          validate: ok
step 3/5: @create istack.ll       validate: ok
step 4/5: @create stats.ll        validate: ok
step 5/5: @create evaluator.ll    validate: ok
passed: 5  skipped: 0

A 2-file Python RPN calculator was transformed into 5 LLVM IR modules. All steps passed on the first attempt. The manifest_is_shim fix was verified with a dedicated test binary: malloc, realloc, free, and strlen all complete instantly where before they caused an infinite hang.

Phase 15: Dynamic Routing + HTMX Framework

The HTTP server had a limitation: every route was a hardcoded branch chain in http_server.ll. Adding a new page meant modifying the server binary, recompiling, and redeploying. Phase 15 replaces this with a data-driven routing framework — and designs the architecture for two future capabilities: route doesNotUnderstand (unknown routes trigger handler generation) and WASM output (handlers compile to WebAssembly for browser-side execution).

Architecture

Before:  request → hardcoded branch chain → static constant → response
After:   request → route_table lookup → handler(ctx, buf, size) → response
Future:  request → route_table miss → doesNotUnderstand → @create handler → response

Three new LLVM IR modules implement the framework:

ModulePurposeLines of IR
route_table.llVec-backed route table: path + method → handler function pointer. Linear scan lookup.~160
handler.llRequest context struct (64 bytes), handler dispatch with indirect call. Returns -1 on route miss (the doesNotUnderstand hook).~130
html_builder.llHTML fragment builder with entity escaping (<&lt;, etc.). Pure buffer operations, no syscalls — WASM-safe.~210

Handler Signature

Every handler has the same pure signature:

i64 @handler(ptr %req_ctx, ptr %resp_buf, i64 %buf_size)
  ; returns: bytes written to resp_buf, or -1 on error

No syscalls, no file I/O, no global state. Pure (input) → (output buffer). This is exactly what WASM can do — linear memory in, linear memory out. A future WASM compile target just needs to swap the buffer passing convention.

Routes Registered

MethodPathHandlerType
GET/handle_indexStatic page (memcpy from content.ll)
GET/whitepaperhandle_whitepaperStatic page
GET/build-loghandle_build_logStatic page
GET/api/statshandle_statsHTMX fragment (html_builder)
GET/fragment/routeshandle_routesDynamic — walks route_table
POST/api/echohandle_echoEntity-escaped POST body echo

Verification

$ curl localhost:8080/                                200 (7454B)
$ curl localhost:8080/whitepaper                      200 (21304B)
$ curl localhost:8080/build-log                       200 (34018B)
$ curl localhost:8080/nonexistent                     404 (9B)
$ curl -X PUT localhost:8080/                         405 (18B)
$ curl localhost:8080/api/stats
  <div id="stats">kernel: 211KB, 35+ modules</div>
$ curl localhost:8080/fragment/routes
  <ul id="routes"><li>/</li><li>/whitepaper</li>...</ul>
$ curl -X POST -d 'Hello <world> & "friends"' localhost:8080/api/echo
  <div id="echo">Hello &lt;world&gt; &amp; &quot;friends&quot;</div>

Static pages serve identically to before. The 404 path returns “Not Found”. Unsupported methods return 405. HTMX fragments return proper HTML with entity escaping. The POST echo handler correctly escapes all four HTML special characters.

doesNotUnderstand for Routes

The -1 return from handler_dispatch is the hook point. When route_lookup returns null, the server currently sends a 404. In the future, this becomes the trigger for: construct a @create route_handler_<path>.ll task, kernel generates the handler via Claude API, compile, register in route_table, and dispatch. The Smalltalk pattern from Phase 14 — applied to HTTP routes.

Binary Sizes

BinaryBeforeAfter
http_server (macOS native)83KB102KB
http_server (Linux static)66KB75KB

Cumulative Statistics

PhaseWhat Was BuiltKey Metric
1. Scalar Functionsfactorial, is_prime, fibonacci, integer_square_root74 tests, 100% pass
2. String I/Ostring_length, count_char, reverse, uppercase, lowercase141 tests, 97%+ pass
3. Behavioral Repairtrim, capitalize_words, caesar_cipher, count_wordsAutomated diagnosis + fix
4. Pattern Distillationbubble_sort, find_second_largest, array operations116+ tests, self-improving
5. Program GenerationHTTP servers from intentZero-dependency Linux binaries
6. Self-Improving Kernel209KB native binary with TLS, DNS, HTTP, JSON, Claude APIFirst self-modification confirmed
7. Directives@target, @create, @plan, @test, @transform, @help7 directives, multi-step orchestration
8. Self-Hostingfix_constants_ir.ll, gen_manifest_ir.llZero Python dependencies
9. Library VerificationFull behavioral test suite21/21 functions pass
10. Intent CompilationTCP server, HTTP server, base64, @transform 43 functionsEnglish → working binary
11. Fixup Passes + CI/CDfix_syscalls.ll, declare_fix.ll, brainfuck interpreter, GitHub Actions3 deterministic passes, automated deploy
12. Manifest SignaturesFull signatures in manifest, Forth interpreter (57KB), stack calculator (35KB)15/15 cross-module signatures correct, 0 manual fixes
13. Manifest On-Demandmanifest_lookup_signature, manifest_collect_relevant, manifest_find_similar8KB truncation fixed, focused context for Claude
14. doesNotUnderstandMissing function detection + auto-generation, retry improvements, buffer scalingInfinite loop bug fixed, 2-pass retry with error context
15. Dynamic Routingroute_table.ll, handler.ll, html_builder.ll, data-driven HTTP dispatch, HTMX fragments6 routes, POST support, doesNotUnderstand hook for routes

Conclusions

The thesis is validated — and then the system built its own next version.

  1. Generation Zero (Phases 1–5) proved that AI can translate natural language intent directly into correct LLVM IR across multiple domains — scalars, strings, arrays, programs
  2. The closed semantic loop works in practice: intent specification, IR compilation, execution, behavioral analysis, and alignment checking
  3. Behavioral repair is possible: when tests fail, the bridge diagnoses the issue and the compiler fixes the IR
  4. Pattern distillation captures what works and improves future generation quality
  5. The system then built its own successor (Phases 6–15): a 211KB native kernel that calls Claude to generate, validate, repair, and deploy LLVM IR — with zero Python dependencies
  6. The kernel achieved self-modification: it rewrote its own source modules, validated them, and rebuilt itself
  7. Intent compilation is real: a single English sentence becomes a working binary via the kernel’s directive system
Generation Zero was built with conventional tools to answer whether the approach is workable. It was. Then the system built its own next version — a native kernel that reads English intent, generates LLVM IR, runs deterministic fixup passes, validates it, repairs it, and rebuilds itself. It ported a Forth interpreter from Python to native code, eliminated its own biggest failure class by teaching itself correct function signatures, converted its bulk manifest loading into on-demand queries, learned to detect and auto-generate missing functions when Claude references code that doesn’t exist yet, and then grew a dynamic routing framework where every handler is a pure function — WASM-portable by design, with a doesNotUnderstand hook ready to auto-generate new routes from English intent. Deployment is automated end-to-end. The compiler that understands meaning is no longer a thesis. It exists.

Explore Further

These pages do not exist yet. Each link triggers the server’s doesNotUnderstand pattern.

Self-Repair Log · Pattern Archaeology · IR Cartography