From Proof of Concept to Self-Improving Kernel
Prove or disprove the core thesis: can AI translate human intent directly into LLVM IR, execute it, and verify the result against the original intent through a closed semantic loop?
This is Generation Zero — built entirely with conventional tools. The question: “is this even remotely workable?”
The first tests targeted pure scalar functions — integers and floats in, scalar out. No strings, no memory, no I/O.
| Test | Function | IR Attempts | Tests | Result | Time |
|---|---|---|---|---|---|
| 1 | factorial (handwritten IR) | N/A | 7/7 | PASS | <1s |
| 2 | unit tests (3 functions) | N/A | 5/5 | PASS | <1s |
| 3 | factorial (full pipeline) | 1 | 14/14 | PASS | 30.0s |
| 4 | is_prime (full pipeline) | 1 | 25/25 | PASS | 27.7s |
| 5 | fibonacci (full pipeline) | 1 | 19/19 | PASS | 31.2s |
| 6 | integer_square_root | 2 | 16/16 | PASS | 39.9s |
Extended to functions that read strings (ptr parameters) and write string output via a caller-allocated buffer pattern.
| Test | Function | Type | Tests | Result |
|---|---|---|---|---|
| 7-8 | string_length, count_char (hardcoded) | string→scalar | 9/9 | PASS |
| 9 | string_length (full pipeline) | string→scalar | 14/14 | PASS |
| 10 | count_char (full pipeline) | string→scalar | 15/15 | PASS |
| 11-12 | reverse_string, to_uppercase (hardcoded) | string→string | 10/10 | PASS |
| 13 | reverse_string (full pipeline) | string→string | 14/14 | PASS |
| 14 | to_uppercase (full pipeline) | string→string | 14/14 | PASS |
| 15 | to_lowercase (full pipeline) | string→string | 14/14 | PASS |
getelementptr + load/store patterns on first attemptselect instruction proved critical for min/max/clamp operations — avoids phi node predecessor errorsThe breakthrough feature: when test cases fail, the Semantic Bridge diagnoses the issue and the Meaning Compiler repairs the IR automatically.
| Test | Function | Repair Iterations | Tests | Result |
|---|---|---|---|---|
| 16 | trim_whitespace | 2 (spec bug) | 11/14 | PARTIAL |
| 17 | trim_whitespace (v2) | 1 | 14/14 | PASS |
| 18 | capitalize_words | 0 | 14/14 | PASS |
| 19 | caesar_cipher | 0 | 14/14 | PASS |
| 20 | count_words | 0 | 14/14 | PASS |
trim_whitespace revealed that the Intent Layer miscounted output string lengths, producing impossible expected values. The repair loop correctly fixed the IR three times but couldn’t overcome bad test expectations. Re-running with a better prompt succeeded immediately.
Pattern distillation captures what works and improves future generation. Array support extended the domain to pointer-to-array functions.
| Test | Function | Type | Tests | Result |
|---|---|---|---|---|
| 21-24 | Distillation tests | mixed | 56/56 | PASS |
| 25-30 | bubble_sort, find_second_largest, etc. | array | 60+ | PASS |
The final proof of Generation Zero: extending from pure functions to standalone programs. A single prompt produces a Linux binary that serves HTTP — compiled from AI-generated LLVM IR with zero dependencies.
gen-content.py → LLVM IR global constants (content.ll)llc + ld.lld inside Alpine Docker → static ELF binaryFROM scratch container, raw Linux syscalls, no libc| Test | Description | Tests | Result | Time |
|---|---|---|---|---|
| 31 | HTTP hello world server | 2/2 | PASS | 44.1s |
| 32 | Multi-route server (index + build-log) | 2/3 | PASS* | ~50s |
| 33 | Full content server (whitepaper + build-log + index) | — | This server | — |
*Test 32 “failure” was a spec-level mismatch: the Intent Layer expected “Index” in the response body, but the actual page content says “Semcom”. All routes verified working via manual curl.
Generation Zero was built entirely with conventional tools. The question was whether this approach is workable. The answer is yes. What comes next is using the system to help build the next version of itself.
The leap from Python proof-of-concept to a native LLVM IR binary that improves itself. Generation Zero proved the thesis — now the system builds its own successor.
| Phase | Capability | Description |
|---|---|---|
| 6A | Raw syscalls | read, write, open, close, mmap, exit — all in LLVM IR |
| 6B | TLS client | BearSSL integration for HTTPS to api.anthropic.com |
| 6C | DNS resolver | Resolve api.anthropic.com from LLVM IR |
| 6D | HTTP client | Full HTTP/1.1 in LLVM IR — buf_append builder pattern, error propagation via -1 |
| 6E | JSON parser | json_find_pattern, json_read_string, json_get_content_text |
| 6F | JSON emitter | Construct Claude API request payloads in LLVM IR |
| 6G | Claude API integration | Prompt construction with manifest + source context enrichment |
| 6H | Self-improvement loop | Read task, generate IR, validate with llc, repair up to 5x, write back to source tree, rebuild |
209KB native binary. The kernel modified its own source code (json.ll) via @target directive, validated the generated IR with llc, wrote it back to the source tree, and rebuilt all binaries. First confirmed self-modification.
The kernel grew new capabilities through a directive system — each directive is a new mode of operation triggered by a prefix in the task file.
| Directive | Purpose |
|---|---|
@target filename.ll | Modify an existing module |
@target filename.ll append | Add new functions to a module (Claude outputs only new code, kernel merges with existing source) |
@create filename.ll | Generate a new module from scratch |
@plan | English intent → Claude plans multi-step build → kernel executes each step |
@test | Semantic verification: generate test harness via 2nd Claude call, compile, link, execute, behavioral repair up to 3x |
@transform | Scan external codebase, plan transformation, execute multi-step port to LLVM IR |
@help | Self-implemented — the kernel added this directive to itself using @target |
Multi-step orchestration: Tasks delimited by ---, executed sequentially with manifest regeneration between steps. The kernel processes each step through the full pipeline (generate, validate, repair, write-back) before advancing to the next.
The kernel eliminated its own Python dependencies — the tools it relied on for build infrastructure were rewritten in LLVM IR by the kernel itself.
| Tool Ported | Original | LLVM IR Version | Lines |
|---|---|---|---|
| Byte-count corrector | fix_constants.py | fix_constants_ir.ll | 382 |
| Manifest generator | gen_manifest.py | gen_manifest_ir.ll | 703 |
Zero Python dependencies remain. Both tools use sys_mmap for dynamic allocation. Critical detail: must use Linux mmap flag values even on macOS due to the syscall translation layer.
Full behavioral verification of the generated library using the @test directive. The kernel generates test harnesses via a second Claude API call, compiles and links them, executes, and checks exit codes.
| Category | Functions Tested | Result |
|---|---|---|
| Math | factorial, fibonacci | PASS |
| Predicates | is_prime | PASS |
| String operations | string_length, reverse, uppercase, lowercase, trim, capitalize, caesar_cipher, count_words, count_char | PASS |
| Array operations | bubble_sort, find_second_largest | PASS |
| Searching | binary_search | PASS |
| Hashing | djb2_hash | PASS |
| Numeric | integer_square_root, gcd | PASS |
| Encoding | base64_encode, base64_decode | PASS |
21 functions tested, 21 passed.
Programs built from pure English intent — a single sentence becomes a working binary.
| Program | Intent | Binary Size | Result |
|---|---|---|---|
| TCP echo server | “Build a TCP echo server on port 8080” | 50KB | All tests pass |
| HTTP content server | “Build an HTTP server with routing for /, /whitepaper, and /build-log” | 48KB | All routes verified |
| Base64 library | “Implement base64 encoding and decoding” | (module) | Verified via @test |
The HTTP content server is the binary serving this page.
A 5-file Python project (sorting, searching, bitwise, numeric, hashing) was ported to LLVM IR using the @transform directive: 43 functions, 1,134 lines of LLVM IR. 4 of 5 modules compiled on the first try. The fifth needed one repair iteration.
A 5-file Python Forth interpreter was ported via @transform: 6 LLVM IR modules, 57KB binary. Supports arithmetic, stack operations, comparisons, DO..LOOP with I/J, IF/ELSE/THEN, word definitions, EMIT, CR, and user-defined words. Factorial, fibonacci, fizzbuzz all pass. Required 12 manual cross-module signature fixes — the primary motivation for the manifest signature enhancement in Phase 12.
Two new deterministic post-processing passes were added to the kernel pipeline, catching and fixing patterns that Claude generates incorrectly. These run after code generation but before llc validation — pure string transforms, no AI involved.
| Pass | What It Fixes | Lines of IR |
|---|---|---|
fix_constants | Corrects [N x i8] byte-count mismatches in string constants | 382 |
fix_declares | Replaces incorrect declare signatures with correct ones from the manifest | 338 |
fix_syscalls | Rewrites raw @syscall(i64 N, ...) to platform-agnostic sys_* wrappers | 380 |
The fix_syscalls pass addresses an abstraction layer violation: Claude sometimes generates raw Linux syscall numbers instead of the project’s cross-platform wrapper functions. On macOS, Linux syscall numbers map to completely different operations — syscall 9 (mmap on Linux) is something else entirely on Darwin. The pass detects which syscall numbers are used, replaces the variadic @syscall declare with specific sys_* declares, and rewrites each call site.
To validate the new passes, a Python brainfuck interpreter was ported to LLVM IR via @transform. Three progressive runs were needed:
interpreter.ll failed (raw @syscall usage, unbalanced braces) — exposed the syscall abstraction bugfix_syscalls pass. 4/4 compiled, but tape.ll still used raw syscalls (the pass missed it because the call pattern differed)tape.ll manually. All 4 modules compile and link. 36KB binary passes all tests: nested loops, Hello World, digit sequences, arithmeticDeployment is now fully automated via GitHub Actions. Pushing to main triggers:
content.ll from HTML pagesld.lldcraneFirst run: all green in 3 minutes 9 seconds.
The Forth interpreter’s 12 cross-module signature mismatches revealed the biggest remaining gap: when Claude generates module B that calls functions in module A, it guesses the signatures. Common errors: void vs i64 returns, i64 vs i1 boolean confusion, inventing extra parameters.
The self-hosted manifest generator (gen_manifest_ir.ll) was enhanced to extract full function signatures from define lines. Previously the manifest listed:
ds_push|forth_stack.ll|public|15
Now it includes the complete signature:
ds_push|forth_stack.ll|public|15|void @ds_push(i64 %val)
Since the kernel dumps the entire manifest into every Claude prompt, the model now sees correct types for every function when generating new modules. Combined with the fix_declares pass (which corrects any remaining mismatches against the manifest), this eliminates the cross-module signature hallucination class entirely.
A 3-file Python stack calculator project was transformed via @transform to validate the fix:
| Metric | Forth (before fix) | Calculator (after fix) |
|---|---|---|
| Cross-module signatures | 12 mismatches | 0 mismatches |
| Manual fixes needed | 14 total | 0 |
| Repair iterations | multiple | 0 |
| Binary works | after manual fixes | immediately |
15 out of 15 cross-module function calls had perfect signature matches. Three modules (stack, math_ops, calculator), all compiled and linked on the first attempt, producing a 35KB binary that passes all tests.
The manifest generator had a latent bug: its defined_funcs array was sized for 256 entries, but the codebase had grown to 282+ functions. This caused stack corruption and a SIGSEGV crash. Fixed by increasing all three internal arrays from 256 to 512 entries.
The manifest (/tmp/kernel_manifest.txt) is the system’s ABI discovery layer — every module discovers how to call every other module through it. By Phase 12 it had grown to 39KB (~391 functions). But it was loaded wholesale: declare_fix.ll read the entire file into a 65KB buffer and linearly scanned it for each declare. Worse, kernel.ll loaded it into an 8KB buffer for Claude prompts — silently truncating 80% of the manifest. Claude could only see the first ~60 functions out of 391.
The fix: turn the manifest from a document into a queryable service with three new LLVM IR functions in manifest.ll.
| Function | Purpose | Lines of IR |
|---|---|---|
manifest_lookup_signature | On-demand exact lookup: given a function name, returns its full signature from the manifest | ~120 |
manifest_collect_relevant | Scans source + task text for @name references, deduplicates, looks up each in the manifest, writes only matching entries to the output buffer | ~380 |
manifest_find_similar | Fuzzy “did you mean?” lookup: scores every function by prefix overlap and suffix match, returns the best match above a threshold | ~300 |
Two private helpers support these: _is_ident_char (character classification) and _cr_add_name (deduplicating name table insertion).
declare_fix.ll (400 → 284 lines): Removed the 65KB manifest buffer, bulk loading, and linear scanning. Each declare line now triggers a single manifest_lookup_signature call. If exact match fails, falls back to manifest_find_similar for fuzzy correction. The manifest is opened, searched, and freed per lookup — simple and correct.
kernel.ll (3 call sites changed): Replaced manifest_load(ptr, 8192) with manifest_collect_relevant(source_ptr, src_len, task_ptr, task_len, out, 8192) at the transform planning, @plan, and compile sites. Instead of dumping the first 8KB of a 39KB file, the kernel now extracts only the functions referenced in the source code and task description. A typical @create task generates ~2KB of focused manifest context — well within the 8KB budget.
A @create string_utils.ll task was run end-to-end through the updated pipeline:
manifest: ok (manifest_collect_relevant found @sys_write, @copy_bytes from task text) api: ok (2977 bytes generated, first attempt) validate: ok (no repair needed) write-back: ok
The kernel correctly identified the two functions mentioned in the task description, looked up their signatures, and passed focused context to Claude. The generated module compiled on the first attempt with correct cross-module declares.
| Metric | Before | After |
|---|---|---|
| Manifest context for Claude | First 8KB (truncated, ~60 functions) | Only referenced functions (~2KB, complete) |
declare_fix.ll manifest access | 65KB bulk load + linear scan | Per-declare on-demand lookup |
| Fuzzy name correction | None | Prefix + suffix scoring, auto-correct above threshold |
manifest.ll | 474 lines | 1,421 lines |
declare_fix.ll | 400 lines | 284 lines |
| Kernel binary | 209KB | 210KB |
The kernel had a blind spot: when Claude generated code calling a function that doesn’t exist in the codebase, the system had no way to detect or recover. The declare_fix.ll pass would silently pass it through, llc would accept the unresolved declare, and the linker would fail — with no diagnostic and no recovery path.
Phase 14 adds doesNotUnderstand — a detection and auto-recovery system inspired by Smalltalk’s message-not-understood protocol.
declare_fix.ll now records functions that fail both exact and fuzzy manifest lookup. After checking manifest_is_shim to exclude known C wrappers, unresolved names are written to /tmp/kernel_missing.txt in the format name|declare line\n. The kernel reads this file after the fixup pipeline and prints a diagnostic.
Detection alone isn’t enough. When missing functions are found, the kernel now builds a @create auto_missing.ll task containing the missing function declarations, saves it to the retry buffer alongside the original step’s task, and skips the current step. On the retry pass, the @create executes first — generating the missing module with correct implementations — then the original step retries with an updated manifest that includes the new functions.
| Feature | Before | After |
|---|---|---|
| Error context in retry | Only task text preserved | Task text + compiler stderr from failed attempt |
| Retry attempts | 1 pass | Up to 2 passes |
| Repair prompt context | Manifest + failed IR + errors | + original task description |
| Fuzzy match hints | Silent replacement | ; NOTE: fuzzy match replaced: <original> comment |
Manifest read buffers scaled from 64KB to 128KB, name tables from 32KB to 64KB (256 entries). Prevents silent truncation as the codebase grows — the manifest reached 39KB in Phase 13, leaving only 25KB of headroom in the old buffers.
Testing exposed an infinite loop in manifest_is_shim: when manifest_next_line returned -1 at a section boundary, the function didn’t check for the negative offset. A GEP with offset -1 read memory before the buffer, and the loop spun forever at 100% CPU. Any function name not in the manifest’s @shims section triggered it — including common names like malloc, realloc, free. One-line fix: check is_next_off < 0 before looping back.
@transform transform_targets/rpn_calc step 1/5: @create stack.ll validate: ok step 2/5: @create rpn.ll validate: ok step 3/5: @create istack.ll validate: ok step 4/5: @create stats.ll validate: ok step 5/5: @create evaluator.ll validate: ok passed: 5 skipped: 0
A 2-file Python RPN calculator was transformed into 5 LLVM IR modules. All steps passed on the first attempt. The manifest_is_shim fix was verified with a dedicated test binary: malloc, realloc, free, and strlen all complete instantly where before they caused an infinite hang.
The HTTP server had a limitation: every route was a hardcoded branch chain in http_server.ll. Adding a new page meant modifying the server binary, recompiling, and redeploying. Phase 15 replaces this with a data-driven routing framework — and designs the architecture for two future capabilities: route doesNotUnderstand (unknown routes trigger handler generation) and WASM output (handlers compile to WebAssembly for browser-side execution).
Before: request → hardcoded branch chain → static constant → response After: request → route_table lookup → handler(ctx, buf, size) → response Future: request → route_table miss → doesNotUnderstand → @create handler → response
Three new LLVM IR modules implement the framework:
| Module | Purpose | Lines of IR |
|---|---|---|
route_table.ll | Vec-backed route table: path + method → handler function pointer. Linear scan lookup. | ~160 |
handler.ll | Request context struct (64 bytes), handler dispatch with indirect call. Returns -1 on route miss (the doesNotUnderstand hook). | ~130 |
html_builder.ll | HTML fragment builder with entity escaping (< → <, etc.). Pure buffer operations, no syscalls — WASM-safe. | ~210 |
Every handler has the same pure signature:
i64 @handler(ptr %req_ctx, ptr %resp_buf, i64 %buf_size) ; returns: bytes written to resp_buf, or -1 on error
No syscalls, no file I/O, no global state. Pure (input) → (output buffer). This is exactly what WASM can do — linear memory in, linear memory out. A future WASM compile target just needs to swap the buffer passing convention.
| Method | Path | Handler | Type |
|---|---|---|---|
| GET | / | handle_index | Static page (memcpy from content.ll) |
| GET | /whitepaper | handle_whitepaper | Static page |
| GET | /build-log | handle_build_log | Static page |
| GET | /api/stats | handle_stats | HTMX fragment (html_builder) |
| GET | /fragment/routes | handle_routes | Dynamic — walks route_table |
| POST | /api/echo | handle_echo | Entity-escaped POST body echo |
$ curl localhost:8080/ 200 (7454B) $ curl localhost:8080/whitepaper 200 (21304B) $ curl localhost:8080/build-log 200 (34018B) $ curl localhost:8080/nonexistent 404 (9B) $ curl -X PUT localhost:8080/ 405 (18B) $ curl localhost:8080/api/stats <div id="stats">kernel: 211KB, 35+ modules</div> $ curl localhost:8080/fragment/routes <ul id="routes"><li>/</li><li>/whitepaper</li>...</ul> $ curl -X POST -d 'Hello <world> & "friends"' localhost:8080/api/echo <div id="echo">Hello <world> & "friends"</div>
Static pages serve identically to before. The 404 path returns “Not Found”. Unsupported methods return 405. HTMX fragments return proper HTML with entity escaping. The POST echo handler correctly escapes all four HTML special characters.
The -1 return from handler_dispatch is the hook point. When route_lookup returns null, the server currently sends a 404. In the future, this becomes the trigger for: construct a @create route_handler_<path>.ll task, kernel generates the handler via Claude API, compile, register in route_table, and dispatch. The Smalltalk pattern from Phase 14 — applied to HTTP routes.
| Binary | Before | After |
|---|---|---|
| http_server (macOS native) | 83KB | 102KB |
| http_server (Linux static) | 66KB | 75KB |
| Phase | What Was Built | Key Metric |
|---|---|---|
| 1. Scalar Functions | factorial, is_prime, fibonacci, integer_square_root | 74 tests, 100% pass |
| 2. String I/O | string_length, count_char, reverse, uppercase, lowercase | 141 tests, 97%+ pass |
| 3. Behavioral Repair | trim, capitalize_words, caesar_cipher, count_words | Automated diagnosis + fix |
| 4. Pattern Distillation | bubble_sort, find_second_largest, array operations | 116+ tests, self-improving |
| 5. Program Generation | HTTP servers from intent | Zero-dependency Linux binaries |
| 6. Self-Improving Kernel | 209KB native binary with TLS, DNS, HTTP, JSON, Claude API | First self-modification confirmed |
| 7. Directives | @target, @create, @plan, @test, @transform, @help | 7 directives, multi-step orchestration |
| 8. Self-Hosting | fix_constants_ir.ll, gen_manifest_ir.ll | Zero Python dependencies |
| 9. Library Verification | Full behavioral test suite | 21/21 functions pass |
| 10. Intent Compilation | TCP server, HTTP server, base64, @transform 43 functions | English → working binary |
| 11. Fixup Passes + CI/CD | fix_syscalls.ll, declare_fix.ll, brainfuck interpreter, GitHub Actions | 3 deterministic passes, automated deploy |
| 12. Manifest Signatures | Full signatures in manifest, Forth interpreter (57KB), stack calculator (35KB) | 15/15 cross-module signatures correct, 0 manual fixes |
| 13. Manifest On-Demand | manifest_lookup_signature, manifest_collect_relevant, manifest_find_similar | 8KB truncation fixed, focused context for Claude |
| 14. doesNotUnderstand | Missing function detection + auto-generation, retry improvements, buffer scaling | Infinite loop bug fixed, 2-pass retry with error context |
| 15. Dynamic Routing | route_table.ll, handler.ll, html_builder.ll, data-driven HTTP dispatch, HTMX fragments | 6 routes, POST support, doesNotUnderstand hook for routes |
The thesis is validated — and then the system built its own next version.
Generation Zero was built with conventional tools to answer whether the approach is workable. It was. Then the system built its own next version — a native kernel that reads English intent, generates LLVM IR, runs deterministic fixup passes, validates it, repairs it, and rebuilds itself. It ported a Forth interpreter from Python to native code, eliminated its own biggest failure class by teaching itself correct function signatures, converted its bulk manifest loading into on-demand queries, learned to detect and auto-generate missing functions when Claude references code that doesn’t exist yet, and then grew a dynamic routing framework where every handler is a pure function — WASM-portable by design, with a doesNotUnderstand hook ready to auto-generate new routes from English intent. Deployment is automated end-to-end. The compiler that understands meaning is no longer a thesis. It exists.
These pages do not exist yet. Each link triggers the server’s doesNotUnderstand pattern.