Guardian Agents and the Philosophy of AI Accountability

How Wittgenstein, Berlin, Ostrom, and Te Ao Maori Converge in a Production Governance Architecture

John Stroh · March 2026 · All Articles

Guardian Agents are a four-phase AI governance system deployed in production by Village, a sovereign community platform. This article traces the philosophical genealogy of their architecture — arguing that the design decisions which make Guardian Agents technically distinctive (mathematical verification instead of generative checking, sovereign processing, human authority preservation, tenant-scoped governance) are not engineering choices that happen to align with philosophical positions, but philosophical commitments that demanded specific engineering responses.

The article draws on Wittgenstein’s distinction between the sayable and unsayable, Berlin’s value pluralism, Ostrom’s polycentric governance, Alexander’s living systems, and Te Ao Māori frameworks of data sovereignty to show how traditions separated by a century and a hemisphere converge on the same architectural requirements for governing AI in community contexts.

I. The Problem: Who Watches the Watchers?

Every AI governance architecture must answer a foundational question: who verifies the verifier?

The standard industry approach — using additional AI models to evaluate AI output — is an engineering response to an engineering problem. It treats verification as a scaling challenge: add more layers, more models, more probabilistic checks. The assumption is that enough independent AI systems checking each other will converge on reliability.

This assumption has a name in safety engineering: common-mode failure. When the verification layer and the generation layer share fundamental properties — both are probabilistic, both hallucinate, both reward confident outputs over calibrated uncertainty — they share fundamental failure modes. The checker confirms the error because the checker reasons the same way as the system it checks.

We encountered this directly. When an AI coding assistant produced a detailed but fundamentally flawed analysis of a database configuration, we asked the same system to write an audit script to verify its work. The audit script shared the same blind spot. It used the same flawed understanding of the domain, applied the same reasoning patterns, and reached the same wrong conclusion — that its original analysis was correct.

This is not an anomaly. It is a structural property of using generative systems to verify generative systems. And it is the starting point for understanding why Guardian Agents are built the way they are.

The question “who watches the watchers?” is not new. Juvenal posed it two thousand years ago. The philosophical traditions that inform Guardian Agents have been working on versions of this question for decades. What is new is the engineering context: AI systems that are confident, capable, and wrong in ways that are increasingly difficult for humans to detect.

II. Four Philosophical Commitments

Wittgenstein: The Boundary Between the Sayable and the Unsayable

Ludwig Wittgenstein’s Tractatus Logico-Philosophicus (1921) draws a line between what can be expressed in propositions (the sayable) and what cannot (the unsayable). Proposition 7 — “Whereof one cannot speak, thereof one must be silent” — is not a counsel of defeat. It is an epistemological commitment: some things can be systematised and some cannot, and confusing the two produces nonsense.

This distinction became the foundational architectural principle of Village AI. Technical optimisations, pattern matching, information retrieval — these belong to computational systems. Value hierarchies, cultural protocols, grief processing, strategic direction — these belong to human judgment. The governance framework enforces this boundary not through policy documents but through code: a boundary enforcement service classifies every decision type and blocks AI from acting autonomously on anything outside the technical domain.

The Architectural Implication

Embedding cosine similarity — the mathematical operation at the heart of Guardian verification — determines how closely an AI response aligns with source material. This is measurement, not interpretation. It belongs firmly in the domain of the sayable. The AI that generated the response operates in a space that inevitably touches the unsayable. The guardian that verifies the response operates entirely in the sayable — it computes distances between vectors. The watcher is not another speaker. The watcher is a measuring instrument.

Berlin: Value Pluralism and the Rejection of Optimisation

Isaiah Berlin’s central thesis in Two Concepts of Liberty (1958) and Four Essays on Liberty (1969) is that legitimate human values are irreducibly plural and sometimes genuinely incommensurable. Justice and mercy, liberty and equality, individual privacy and collective memory — these are not competing approximations of some higher meta-value. They are genuinely different things, each valuable in its own right, and the pursuit of one sometimes necessarily requires the sacrifice of another.

This has a devastating implication for AI governance: there is no objective function that resolves values conflicts. Any system that claims to “optimise” across incommensurable values is not being neutral — it is imposing a hidden hierarchy. Berlin’s work demands that an AI governance system never assume a default value ranking, never silently resolve a values conflict, and always make visible what is sacrificed in every decision.

The Architectural Implication

Guardian Agents inherit this commitment in their tenant-scoped architecture. What counts as an anomaly in a parish archive — where accuracy about historical dates is paramount — differs fundamentally from what counts as an anomaly in a neighbourhood coordination group — where timeliness matters more than precision. These are not different calibrations of the same value. They are different values, irreducibly so. Each community defines its own principles, its own anomaly baselines, its own threshold overrides. The platform provides safety floors. Communities provide value direction.

Berlin also illuminates why the evidence burden for Guardian threshold changes is deliberately asymmetric. Loosening a safety threshold requires 85% confidence. Tightening a threshold requires only 60%. This asymmetry reflects Berlin’s insight that the consequences of error are not symmetric across value dimensions. A false negative — missing a real problem — is worse than a false positive — flagging a non-problem — because the false negative silently erodes the community’s epistemic ground.

Ostrom: Polycentric Governance and the Commons

Elinor Ostrom’s Nobel Prize-winning research in Governing the Commons (1990) demonstrated that communities govern shared resources effectively through polycentric governance — multiple independent centres of authority operating without hierarchical subordination. Her conditions for effective commons governance (clear boundaries, collective-choice arrangements, monitoring, graduated sanctions, conflict resolution, nested enterprises) map to multi-tenant AI governance with remarkable precision.

The Architectural Implication

The monitoring architecture enforces a strict privacy boundary that creates genuinely independent verification centres: tenant moderators see full content for their own community; platform administrators see only aggregate metrics. Neither authority can override the other. Neither has access to the other’s domain. This is not role-based access control as a security measure — it is polycentric governance as an architectural principle.

Ostrom’s insight about “nested enterprises” appears in the Guardian threshold override system. Overrides can be tenant-specific (a community adjusting its own sensitivity) or platform-wide (a baseline safety change). The resolution order is explicit: tenant overrides take precedence over platform overrides, which take precedence over frozen defaults. This nesting ensures that local governance is not subordinated to platform-level decisions while platform safety floors remain enforceable.

The “who watches the watchers?” question receives an Ostromian answer: everyone watches everyone, within clearly defined jurisdictional boundaries. No single authority is root. No single point of failure exists.

Te Ao Māori: Data Sovereignty as Governance Principle

Indigenous data sovereignty frameworks — particularly Te Mana Raraunga’s six principles (rangatiratanga, whakapapa, whanaungatanga, kotahitanga, manaakitanga, kaitiakitanga) and the CARE Principles for Indigenous Data Governance (Collective Benefit, Authority to Control, Responsibility, Ethics) — provide what is perhaps the most directly architectural of the philosophical inputs to Guardian Agents.

Where Wittgenstein offers an epistemological distinction, Berlin a theory of values, and Ostrom a governance model, Te Ao Māori frameworks offer a complete account of the relationship between data, community, and authority. Data about a community belongs to that community — not to a platform, not to a researcher, not to a government. The community exercises rangatiratanga (self-determination) over its own data. The platform exercises kaitiakitanga (guardianship) — a fiduciary obligation to protect, not own.

The Architectural Implication

When we say “all guardian processing runs on the community’s own infrastructure” and “no data leaves the tenant boundary for safety checks,” we are not describing a technical preference for on-premises computing. We are implementing rangatiratanga: the community’s right to govern what happens to its own data, including the governance mechanisms applied to it.

A critical note on intellectual honesty: these frameworks were developed by and for Indigenous peoples. Their application to a software platform built by non-Māori developers is an act of learning from, not speaking for. The architectural principles are drawn from published frameworks (Te Mana Raraunga Charter, CARE Principles, OCAP Principles) with explicit acknowledgment that implementation in Indigenous contexts would require Indigenous governance, consent, and co-design.

III. Convergence: Why These Traditions Demand This Architecture

These four traditions — separated by a century and a hemisphere, developed in contexts ranging from early twentieth-century Vienna to contemporary Aotearoa New Zealand — converge on the same architectural requirements.

Mathematical Verification

Wittgenstein demands verification in a different epistemic domain from generation. Berlin requires transparency. Embedding cosine similarity satisfies both: mathematical, deterministic, and epistemically distinct from the generation process it verifies.

Sovereign Processing

Te Ao Māori requires that data governance be exercised by the community that owns the data. Ostrom requires genuinely independent governance centres. Together: guardian processing runs locally, with no external dependency.

Human Authority

Wittgenstein’s unsayable cannot be delegated to machines. Berlin’s incommensurable values cannot be algorithmically resolved. Ostrom requires human participation. Te Ao Māori requires community self-determination. The guardian proposes, the human decides.

Tenant-Scoped Governance

Berlin’s value pluralism means different communities hold different values. Ostrom means multiple independent authorities. Te Ao Māori means each community governs its own domain. Governance is scoped to the community, not universalised across a platform.

The convergence is not coincidental. Each tradition, from its own starting point, has been working on the same fundamental problem: how to govern shared resources and collective decisions without imposing a single authority’s values on everyone else. That this problem is now appearing in AI governance does not make it new. It makes it urgent.

IV. Embedding Similarity as an Epistemological Commitment

The choice to use embedding cosine similarity as the primary verification mechanism in Guardian Agents deserves philosophical attention beyond its technical merits.

Standard AI safety research treats verification as a classification problem: is this response safe or unsafe, accurate or inaccurate, aligned or misaligned? Classification presupposes categories, and categories presuppose values. The decision about where to draw the boundary between “safe” and “unsafe” is itself a values decision — one that Berlin would insist cannot be made algorithmically.

Embedding similarity does not classify. It measures distance. The question is not “is this response accurate?” but “how closely does this response align with what the community actually knows?”

The difference is epistemologically significant. Classification asserts knowledge (“this is safe”). Measurement provides evidence (“this is 0.73 similar to source material”). The human who sees the measurement decides what to do with it. The system that provides the measurement does not need to know what counts as “good enough” — that is a values question, scoped to the community, decided by moderators.

This is why confidence badges present a score-derived tier (verified, partially verified, unverified) rather than a binary safe/unsafe label. The tier is informational. The human interprets it. The guardian measures; the human judges. Wittgenstein’s boundary is preserved at the interface between system and user.

The “Dig Deeper” feature extends this epistemological commitment to individual claims. When a member expands the source analysis panel, they see each claim mapped to its source (or marked as unmatched). The system does not say “this claim is wrong.” It says “we could not find this claim in your community’s records.” The difference matters: absence of evidence is not evidence of absence, and a system that confuses the two has crossed from measurement into judgment.

V. The Adaptive Learning Paradox

Phase 4 of Guardian Agents — adaptive learning — presents the most philosophically challenging design problem. If the system learns from moderator decisions, and moderator decisions are influenced by the system’s recommendations, is the human authority real or performative?

This is a variant of Berlin’s warning about “positive liberty” — the claim that an authority knows a person’s “true” interests better than the person does. If the guardian system’s analysis is so compelling that moderators always follow its recommendations, human authority is formally preserved but functionally eliminated.

The architectural response to this paradox is threefold:

First, the analysis is deterministic, not generative. Phase 4 gathers evidence (historical alerts, baseline deviations, resolution patterns) and applies rule-based classification. No language model inference is involved. The analysis can be fully inspected, fully audited, and fully understood by a moderator. It is a summary of evidence, not a prediction.

Second, the evidence burden is asymmetric. The system requires stronger evidence to recommend loosening restrictions than tightening them. This encodes a substantive value judgment — that the costs of false negatives are higher than false positives — but it encodes it transparently, in auditable configuration, subject to community override.

Third, a regression monitor watches every approved change. If metrics worsen within 24 hours, the change is automatically flagged for review. The system’s own learning is subject to the same evidence-based scrutiny as the AI output it governs. This is Ostrom’s monitoring principle applied reflexively: the governance system monitors itself.

Whether these measures are sufficient to preserve genuine human authority is an open question. The honest answer is that no technical architecture can fully prevent automation bias — the tendency for humans to over-rely on automated recommendations. What architecture can do is make the evidence transparent, the reasoning inspectable, and the reversal trivial. Guardian Agents aim for conditions that support genuine human judgment, not conditions that guarantee it.

VI. Market Position and the Limits of Industry Trajectory

Leigh McMullen of Gartner (May 2025) describes guardian agents evolving through three phases — quality control, observation, and protection — all defined as “AI designed to monitor other AI.” Village’s Guardian Agents already encompass all three of Gartner’s phases and add a fourth — Adaptive Learning — that Gartner does not envision.

But the systems that forecast describes — and the sovereign AI infrastructure that companies like IBM are building — fall fundamentally short of what the philosophical commitments described in this article demand. Gartner’s entire model assumes generative verification (AI checking AI), cloud-dependent processing, universal thresholds, automated operation with minimal human governance, and platform-scoped policies. Even IBM’s Sovereign Core — launched in January 2026 as the first enterprise AI designed for local governance — addresses only data residency: where the data sits and who can access it. It does not give the community a constitutional voice in what the AI does with that data.

From the philosophical perspective developed in this article, every one of these assumptions is inadequate:

Village’s Guardian Agents resolve all five because they were derived from these philosophical commitments, not from engineering convenience. The gap between Village’s 2026 deployment and the industry’s 2028 destination is not temporal — it is qualitative. The industry will arrive at guardian agents that monitor AI output. Village has guardian agents that implement constitutional governance. These are different architectures serving different purposes, even when they share a name.

This distinction — governance as constitutional architecture versus governance as automated monitoring — may prove to be the most significant contribution of the Village project to the broader discourse on AI governance. Not because the specific technical choices are universally applicable, but because the methodology is: start with the philosophical commitments, derive the architecture, build the capability within the architecture’s constraints.

The result is a system where safety and capability are not in tension, because safety is the architecture within which capability operates.

References

Alexander, C. (1977). A Pattern Language. Oxford University Press.

Alexander, C. (2002-2004). The Nature of Order (Vols. 1-4). Center for Environmental Structure.

Berlin, I. (1958). Two Concepts of Liberty. In Four Essays on Liberty (1969). Oxford University Press.

Carroll, S. R., et al. (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19(1), 43.

Kukutai, T., & Taylor, J. (Eds.). (2016). Indigenous Data Sovereignty: Toward an Agenda. Australian National University Press.

Ostrom, E. (1990). Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge University Press.

Te Mana Raraunga. (2016). Te Mana Raraunga Charter.

Wittgenstein, L. (1921). Tractatus Logico-Philosophicus. Translated by C. K. Ogden (1922). Routledge & Kegan Paul.

This article is part of the Agentic Governance research programme at My Digital Sovereignty Ltd. Village is currently in beta pilot, accepting applications from communities and organisations ready to participate in the governance architecture described here.

Licence: CC BY 4.0 International

How Guardian Agents Work Why We Built Them AI Governance Series All Articles