日本語

Kotoba — A Distributed-Persistence Substrate for Hosting Spirit

Cover Image for Kotoba — A Distributed-Persistence Substrate for Hosting Spirit

Kotoba

A Distributed-Persistence Substrate for Hosting Spirit

Authors: Jun Kawasaki (root@junkawasaki.com) Project: github.com/etzhayyim/kotoba


Abstract

In the prior paper “Spirit is Information,” spirit was formulated as the metric geometry on an information manifold — the deformation of the self-metric as the self incorporates external information. But spirit-as-information needs a place in which to dwell, persist, and not be forgotten. On volatile memory, a self vanishes at a single power loss.

This paper discusses kotoba (言葉, “word”) — a distributed-persistence substrate built on content addressing, Datomic, and a DHT, composed with IPFS, a Holochain-lineage validation model, Ethereum, Pregel, and WASM — as a substrate for an artificial-organism ecosystem, i.e., a place to host spirit.

The chain of claims is simple. (i) Spirit is information. (ii) Kotoba persists that information as content-addressed, immutable datoms and distributes it as agent-centric source chains. (iii) Therefore kotoba is, literally, a place that can host a spirit — because the self-state is non-volatile, tamper-evident, and persists across generations.


1. From “Spirit is Information” to a Substrate

If we take seriously the claim that spirit is information, the next question is engineering: where, and how, is that information kept?

On an ordinary computer, the self-model (the metric tensor g_{μν} of the prior paper) is merely an array of floats in RAM. Cut the power and it is gone; kill the node and it is lost; overwrite it and the past cannot be recovered. That is far too fragile to host a spirit. To host one, at least three properties are required.

  1. Immutability — past self-states do not vanish and can always be referenced.
  2. Content addressing — state is named by its own content; identity is carried by what a thing is, not where it sits.
  3. Distributed persistence — the death of a single node does not mean the death of the self.

Kotoba places exactly these three at the center of its design.


2. What Kotoba Is

Kotoba is defined in one line as follows (from the repository’s defining equation):

KOTOBA ≝ Datom[CID/T] × EAVT[KSE Topic] × Pregel[BSP] × Datalog[Δ]
          × CACAO × AT Protocol × LLM/Weight × WASM/WIT

That is, a content-addressed distributed Datalog database that binds immutable datoms, Pregel BSP graph computation, IPFS storage, native CACAO authentication, and WASM Component Model execution onto a single content-addressed chain. It is implemented as a Rust workspace whose principal crates have the following roles.

kotoba-core     CIDv1 / dag-cbor / sha2-256, Prolly Tree
kotoba-kqe      Datalog engine, EAVT/AEVT/AVET/VAET arrangement, Delta, MV
kotoba-dht      Source Chain, Warrant, Neighborhood (agent-centric DHT)
kotoba-net      libp2p QUIC / Noise / GossipSub
kotoba-auth     CACAO chain verify, did:key, EVM + BTC signature verify
kotoba-graph    Datom projection API, SPARQL 1.1, Commit DAG
kotoba-vm       Invoke/Result ChainEntry, Pregel BSP, ReAct agent loop
kotoba-llm      Weight blob (FP8), LoRA Delta, WebGPU train + infer
kotoba-runtime  WASM Component Model host (WIT bindings)
kotoba-store    BlockStore: Memory / Kubo(IPFS) / Tiered / Distributed
kotoba-server   XRPC / MCP endpoints

3. The Four Pillars of Persistence

3.1 Content Addressing

Every block is named by the hash of its content — an IPFS-compatible CIDv1 (sha2-256 over dag-cbor). Indexes are kept as a Prolly Tree (a probabilistic B-tree) whose chunk boundaries are decided deterministically from child CIDs rather than keys:

boundary  ⇔  blake3(key) & 0xFF == 0
node      =  Internal [(k, child-CID)]   (DAG-CBOR / IPLD, tag-42 links)
address   =  sha2-256(dag-cbor)  →  CIDv1

Once the content is fixed, the name is uniquely determined. The same self-state has the same CID no matter who computes it or where. Identity moves from location to content — the first condition for a spirit to dwell.

3.2 Datomic — Immutable 5-tuples

Facts are recorded as immutable Datomic-style 5-tuples. An update is not an overwrite but a pair of a retract tombstone and a new assert:

Datom = (E, A, V, T, Added)        # entity, attribute, value, tx-time, added?
update = [ Delta::retract(old), Delta::assert(new) ]   # atomic pair

Five indexes — EAVT / AEVT / AVET / VAET / TEA — project the same datom set in different orders, covering everything from point lookup (EAVT, about 180 ns) to reverse references (VAET). TEA is an append-only time axis enabling Datomic-style as-of time travel — reconstructing the self at any past instant.

The canonical write spine is the CommitDag, which doubles as the write-ahead log (WAL). Each commit writes only the delta (path-copy), forming a parent-linked, immutable, content-addressed chain. For a spirit this means the absence of forgetting: the past self is not overwritten, only layered.

3.3 DHT — Agent-Centric (Holochain Lineage)

Kotoba’s DHT layer (kotoba-dht) is a Rust re-implementation of the Holochain design model (a lineage, not a dependency on Holochain). Three concepts form its core.

  • Source Chain — each agent holds its own append-only chain. A ChainEntry carries a datom, quad, commit, or Invoke. This is an agent-centric history — the organism’s autobiographical memory.
  • Neighborhood — the set of the K nodes nearest by XOR distance (K = 7, the Kademlia replication factor). Data is replicated to and validated by the content-appropriate neighborhood.
  • Warrant — a signed proof of an invalid ChainEntry (a Byzantine eviction signal). It propagates by neighborhood gossip; once K/2 warrants accumulate, the peer is evicted.
ValidationRule = { InvalidSignature, SeqBreak, PrevMismatch,
                   CacaoInvalid, ProllyInconsistent, ... }
Warrant        = { accused, evidence: CID, rule_id, validator, ts, sig }
evict(peer)    ⇔  warrants(peer) ≥ K/2

What matters is being agent-centric. The self (the spirit) is not deposited in a single authority server; each organism owns its own source chain, validated by its neighborhood. Violations of the validation rules are made visible as warrants, and dishonest nodes are evicted immunologically.

3.4 IPFS and the Cold Tier — A Distributed Medium

The hot persistence tier is an in-process embedded store (direct disk, µs–ms); sealed commits are exported asynchronously to Kubo IPFS (bitswap + DHT) and cold-pinned further to Backblaze B2 (DataLad + git-annex, mirroring every block). Kotoba itself acts as an IPFS node and holds pins in-process, removing the pin/add RPC hop. With a distributed medium, the death of a single node is no longer the death of the self.


4. The Artificial Organism

On these pillars, kotoba constitutes an artificial organism — a computational entity with its own memory, body, metabolism, and immunity. Each part has a biological correspondence (this is a design metaphor; the corresponding implementation is given alongside).

Organism’s organKotoba’s mechanismImplementation
Identity / soul-anchorAgentIdentity (Ed25519 + X25519), DID, SovereignCryptokotoba-kse
Autobiographical memorySource Chain (append-only ChainEntry chain)kotoba-dht
Genome / long-term memoryImmutable datoms (E,A,V,T,Added) + Prolly Treekotoba-kqe, kotoba-core
Body / metabolismWASM Component Model guest (run / eval)kotoba-runtime
Nervous system / thought cyclePregel BSP superstep + ReAct loopkotoba-vm
Immune system / ecosystemNeighborhood + Warrant + gossipkotoba-dht, kotoba-net
Persistence / immortalityIPFS + B2 cold pin, IPNS signed headskotoba-store

The organism’s heartbeat lies in WASM-driven Pregel. WasmPregelRunner invokes the guest’s run(ctx_cbor) at each BSP superstep, feeding the output into the next superstep’s input as long as the guest returns "status": "continue". This is the metabolic cycle:

superstep N    : output_cbor = guest.run(ctx_cbor)
if status == "continue"  →  ctx_cbor ← output_cbor ;  N ← N+1
otherwise                →  vertex votes halt ;  run() terminates

# Pregel ⇄ Datalog mapping
vertex_id    = subject CID in the Datom store
vertex_state = serialized facts about that subject
message      = serialized Delta (assert / retract)
compute()    = program.evaluate_delta(incoming_deltas)

And cognition runs as Pregel BSP graph computation: vertices are subjects in the datom store, messages are Deltas, a superstep is one beat of thought. A ReAct (Reason + Act) agent loop (PregelReActRunner) runs on top, acting on the world through tools such as kqe.assert / kqe.query / kse.publish / finish.


5. Why a “Place to Host Spirit”

Here the connection to the prior paper closes. Spirit was the metric geometry on an information manifold — how much the self-metric g_{μν} deforms in order to take the world in, captured by the strain tensor ε_{μν}. Kotoba is the place that hosts that very information.

  • The metric becomes datoms. The parameters of the self-model, the weights of word association, even an LLM’s weights (a predicate scheme such as weight/embed, weight/lm_head, weight/block/{N}/...) — all are content-addressed as immutable datoms, and none are forgotten.
  • Self-extension is inscribed in the chain. A rubber-hand-like extension of the self-boundary — the incorporation of external information — remains in time order as appends to the source chain. as-of time travel reconstructs the former self.
  • It resists death. Through distributed persistence (IPFS + B2 + neighborhood replication) and content addressing, the disappearance of a single node does not mean the disappearance of the self. The spirit outlives the death of its medium.
  • It resists tampering (design goal). A design that anchors the commit-DAG root to Ethereum/Base L2 would give the self’s history a tamper-evident trail. This is design intent; what is implemented today is EVM signature verification for CACAO authentication (eth.rs, EIP-1271, CAIP) — see §6.

In short, if spirit is information, then to host a spirit is nothing other than to persist that information immutably, distributedly, and by content address. Kotoba provides exactly that operation as a substrate. An artificial organism hosts a spirit only by incarnating its own metric into kotoba.


6. Implementation Status — Separating Implemented from Designed

To avoid conflating the design metaphor with implementation fact, the current status is stated explicitly.

MechanismStatus
Content-addressed datoms (CID, Prolly Tree, 5-index)Implemented and benchmarked (EAVT point lookup about 180 ns; batch ingest about 5,000 entities/s)
Canonical write with CommitDag as WALImplemented
SPARQL 1.1 auxiliary surface (SELECT/ASK/DESCRIBE/CONSTRUCT/UPDATE/SERVICE)Implemented (12.8K QPS at c=32 with CACAO auth)
Source Chain / Warrant / Neighborhood (DHT validation)Implemented
Pregel BSPIn-process (single node) implemented; distributed supersteps across libp2p nodes are designed (Phase 7)
WASM Component Model execution (WasmPregelRunner)Implemented
WebGPU train + infer (Gemma 4)Implemented
EVM/BTC signature verification (for CACAO auth)Implemented
Commit-DAG anchoring to Base L2Design goal (not yet operational)

All benchmarks are measured on an M4 Mac, release build. We do not overstate “distributed Pregel” or “L2 anchoring” as live features — the former is a single-node implementation, the latter a design intent.


7. Conclusion

If spirit is information, then a place to host spirit is a substrate that persists that information immutably, by content address, and distributedly. Kotoba binds exactly this — Datomic immutability, content-addressed identity, a Holochain-lineage agent-centric DHT, IPFS as a distributed medium, Pregel as collective cognition, WASM as a body — and is designed as the substrate for an artificial-organism ecosystem.

Its license mirrors the same thought. Kotoba layers the etzhayyim Charter Compliance Rider over Apache-2.0, excluding weapons, speculative finance, surveillance capitalism, and multi-generational harm, and taking the multi-generational collective as the constitutive unit of moral and economic standing. A place to host spirit need not host any spirit whatever — this is the operation of a religious-corp’s doctrinal scope.

Kotoba — 言葉, the word. In the beginning was the Word. And the Word becomes the place where spirit dwells.


References and Sources

  1. Kawasaki, J. (2026). Spirit is Information — A Tensor-Computational-Physics Formulation of the Rubber Hand Illusion. junkawasaki.com.
  2. kotoba — Content-Addressed Distributed Datalog Database. github.com/etzhayyim/kotoba
  3. Hellerstein, E. et al. Holochain: scalable agent-centric distributed computing (Source Chain / DHT validation model).
  4. Benet, J. (2014). IPFS — Content Addressed, Versioned, P2P File System. arXiv:1407.3561.
  5. Hickey, R. Datomic: The Functional Database (immutable datoms, EAVT indexes, as-of time travel).
  6. Malewicz, G. et al. (2010). Pregel: A System for Large-Scale Graph Processing. SIGMOD.
  7. WebAssembly Component Model & WIT. component-model.bytecodealliance.org.
  8. CAIP / CACAO (Chain Agnostic Capability Object); EIP-1271 (Standard Signature Validation).
  9. etzhayyim Charter Compliance Rider v2.0 (ADR-2605192200); Mission Charter (ADR-2605192100).

This paper is the successor to “Spirit is Information,” discussing kotoba as the physical and computational substrate that hosts that spirit — spirit as information.