Reading Language as Graph Topology

A summary and blog-side reconstruction of the preprint Languages as Graph Topologies (Kawasaki, 2026).

1. Starting Point — Grammar is the Shape of Computation

English, Japanese, Hindi, French, Arabic. Word orders split into SVO, SOV, VSO. This paper’s claim: that split is not surface reshuffling — it is a difference in computational strategy, in how each language bundles and processes information.

Stop looking at a sentence as a string of words. Look at it as a dependency graph G = (V, E): vertices are words, edges are head→dependent relations. A different topology emerges for each language.

2. Five Topological Metrics

The paper defines five:

Metric	Definition	Reads as
Tree depth `D(T)`	`max_v dist(r, v) /	V
Branching factor `B(T)`	mean out-degree of internal nodes	how wide each head fans
Head Directionality Index (HDI)	fraction of edges where head precedes dependent	head-initial vs. head-final
Crossing Number (CN)	fraction of arc pairs that cross	non-projectivity
Mean Dependency Distance (MDD)	mean linear distance between head and dependent	working-memory load

On top of that, Betti numbers from algebraic topology: β₀ counts connected components, β₁ counts independent 1-cycles. Projective trees have β₁ = 0; non-projective constructions push β₁ > 0. That gives a topological invariant for grammatical freedom.

3. Three Computational Shapes

Plot the five languages on these axes and they fall into three topological classes. Each class doubles as an information-processing metaphor — the same primitives you find in functional programming.

Three computational topologies of language

3.1 Convergent — `fold` (Japanese, Hindi · SOV)

Sentence: neko-ga nezumi-wo oikaketa — “the cat chased the mouse.” Every dependency arc converges on the sentence-final verb oikaketa. HDI ≈ 0.15 (head-final), CN ≈ 0.10 (mostly projective), β₁ ≈ 0.05 (most tree-like of the five).

This is fold in functional terms. Arguments accumulate left to right and the final verb closes the reduction. Japanese case particles (-ga, -wo, -ni) are argument labels; the verb is a function that takes the accumulator and evaluates.

-- Japanese sentence as a fold
sentence = foldl (flip ($)) verb [subject, object]
-- Arguments are pushed onto the stack; the verb at the end pops and computes.

3.2 Divergent — `apply` (English, French · SVO)

Sentence: “The cat chased the mouse.” The verb chased commits early, branching left to subject and right to object. HDI ≈ 0.75 (head-initial), β₁ ≈ 0.10.

This is apply. The verb commits as a function early, and the parser forward-predicts to fill the remaining argument slots. The structural room English leaves for gaps (What did you give ___?) comes directly from this topology — the verb’s argument frame is fixed before the arguments arrive.

-- English sentence as an application
sentence = chased subject object
-- Verb is curried; arguments are supplied as they appear.

3.3 Broadcast — `scatter` (Arabic · VSO)

Sentence: ṭārada qiṭṭun faʾran (verb-subject-object). The verb sits at the front; subject and object branch simultaneously from the root. HDI ≈ 0.60, β₁ ≈ 0.22 — the highest of the five.

This is closer to MPI’s scatter / broadcast. The verb projects multiple slot requirements at once, and rich agreement morphology (the verb inflects for subject person/number) compresses the subject channel. The high word-order freedom — high β₁ — is licensed precisely because the morphology, not position, carries the grammatical relation.

4. Languages as Clusters in Topological Space

Plot the five languages on HDI vs. β₁ and the three clusters separate geometrically as well.

HDI x beta-1 scatter showing the three clusters

The interesting thing is Japanese and Hindi separating. Both are tagged SOV / head-final, but Japanese is projective (β₁ ≈ 0.05) while Hindi — which allows scrambling — is markedly non-projective (β₁ ≈ 0.18). The discrete label “SOV” cannot see this difference; the topological invariants do, as a continuous quantity.

5. Why Topology — The Shape/Computation Trade-off

The paper’s central theorem (Topology–Computation Correspondence) claims that the computational strategy C(L) of a language L is a function of its topological feature vector (HDI, CN, β₁).

C(L) = f( HDI(T), CN(T), β₁(T) )

And the basic trade-off the metrics reveal:

Encode grammatical relations in position (β₁ → 0) or in morphology (β₁ ↑).

The two are information-theoretically equivalent. Japanese and English pick the first; Arabic and Hindi pick the second. Both preserve the root-directed dependency structure — what differs is how it is preserved, and that difference shows up in topology.

6. Implications for NLP — Should Architectures See Topology?

Today’s transformers process every language with the same architecture. Positional encoding is linear, and the structural difference between fold languages and apply languages lives only inside the attention weights.

But by the paper’s argument, that is a topological mismatch. For a language like Japanese — arguments converging on a final verb — a fold-friendly inductive bias (right-to-left evaluation order, a causal mask oriented toward the final head) is in principle the natural fit.

The implication runs deeper than parsing accuracy: varying inductive bias per language family is a candidate counter-argument to the universal-architecture hypothesis behind transformers.

7. Summary

A sentence is a dependency graph. A graph has topology.
Take HDI and β₁ and the five languages split into three clusters: convergent / divergent / broadcast.
Each cluster corresponds to a functional-style computation: fold / apply / scatter.
Topology is the geometric description of a single trade-off: encode grammar in position or in morphology.

Languages are not isolated grammars. They are different computational machines for processing information. Topology is the lens that shows it.

For the full definitions, Betti number computations, and per-language corpus analysis, see the preprint PDF at com-junkawasaki/kotoba-topology.

References

Kawasaki, J. (2026). Languages as Graph Topologies: A Computational Framework for Cross-Linguistic Syntactic Structure Analysis. Preprint.
Tesnière, L. (1959). Éléments de syntaxe structurale.
Mel’čuk, I. (1988). Dependency Syntax.
Liu, H. (2008). “Dependency Distance as a Metric of Language Comprehension Difficulty.”
Futrell, R., Mahowald, K., Gibson, E. (2015). “Large-scale evidence of dependency length minimization in 37 languages.”
Hatcher, A. (2002). Algebraic Topology. Cambridge UP.
Greenberg, J. H. (1963). “Some universals of grammar with particular reference to the order of meaningful elements.”

Reading Language as Graph Topology — Five Languages, Three Computational Shapes