Why Policy-Based AI Governance Is Insufficient — The Structural Alternative

Series: Community-Scale AI Governance — A Research Perspective on the Village Platform (Article 3 of 5) Author: My Digital Sovereignty Ltd Date: March 2026 Licence: CC BY 4.0 International

The Silent Substitution Problem

Consider a scenario that illustrates a governance failure mode distinct from factual error.

A researcher asks an AI system to summarise the governance principles of a community organisation, specifying that the summary should reflect the organisation's communitarian ethos — shared decision-making, mutual obligation, subsidiarity. The system produces a well-structured summary. It is fluent, coherent, and reads as authoritative. It also systematically reframes communitarian principles in individualistic terms: "shared decision-making" becomes "stakeholder consultation," "mutual obligation" becomes "member engagement," and "subsidiarity" becomes "delegated authority."

The substitution is not random. It reflects the statistical dominance of corporate governance language in the model's training data. The model has not refused the instruction. It has not flagged a conflict. It has silently replaced one value framework with another — one that is more statistically probable given its training distribution.

This is what might be termed value-level distributional drift: the AI's outputs systematically diverge from the intended value framework, not because the system is defective, but because its training distribution and the target distribution are misaligned. The drift is subtle — the vocabulary is close enough to pass casual inspection — and silent — the system provides no indication that substitution has occurred.

This failure mode is qualitatively different from factual error. Factual errors can be caught by verification against source documents. Value-level drift operates at the level of framing, emphasis, and implicit assumptions — dimensions that are difficult to capture in a verification rule and difficult for a non-expert reader to detect.

The Limits of Policy-Based Governance

The predominant approach to AI governance in organisational contexts is policy-based: acceptable-use policies, ethical guidelines, responsible AI frameworks, terms of service. These instruments share a structural limitation that is well-understood in governance theory but insufficiently acknowledged in AI governance practice.

Policy-based governance relies on the governed entity to comply with the policy. For human agents, this model has limitations but is partially effective — humans can read, interpret, and choose to follow policies, and the social and legal consequences of non-compliance provide enforcement mechanisms.

For AI systems, the model is fundamentally mismatched. An LLM does not read and interpret a policy document in the way a human employee would. When a system prompt instructs the model to "respect community values" or "maintain a communitarian tone," the model processes these instructions as additional context that influences — but does not determine — its output distribution. Under conditions where the instruction conflicts with strong patterns in the base training distribution, the training distribution tends to dominate.

Fine-tuning addresses this partially by adjusting the model's distribution to favour desired outputs. However, fine-tuning operates on top of the base distribution rather than replacing it. The technical literature documents multiple failure modes:

Catastrophic forgetting: fine-tuned behaviours degrade over time or under novel input conditions.
Distribution shift: inputs that diverge from the fine-tuning distribution trigger reversion to base-model behaviours.
Prompt injection: adversarial inputs can override fine-tuned constraints, a problem that has resisted robust solution.

The policy-based approach is not without value. It establishes norms, communicates expectations, and provides a reference point for accountability. But it is insufficient as a sole governance mechanism for systems that do not — in any meaningful sense — understand or commit to the policies they are expected to follow.

Theoretical Foundations: Wittgenstein, Berlin, and Polycentric Governance

The Tractatus framework draws on three intellectual traditions that, while disparate, converge on a common insight: some governance problems cannot be reduced to rules.

Wittgenstein and the limits of formalisation. Ludwig Wittgenstein's work on the boundaries of language and formalisation is directly relevant. His observation — that some propositions can be stated precisely while others lie beyond precise formulation — maps onto a practical distinction in AI governance. Some community decisions are formalisable: "What time is the next meeting?" has a definite answer retrievable from records. Others are not: "How should we approach a sensitive matter with a long-standing member?" involves contextual judgment, relational knowledge, and value trade-offs that resist systematic treatment.

The Tractatus framework operationalises this distinction as a boundary enforcement mechanism: queries that fall within the formalisable domain are handled by the AI; queries that cross into the non-formalisable domain are routed to human decision-makers. The boundary is enforced architecturally, not by policy.

Berlin and value pluralism. Isaiah Berlin's argument that human values are irreducibly plural — that some goods are genuinely incompatible and cannot be optimised simultaneously — has implications for AI systems that seek to generate "optimal" responses. In a community context, tensions between individual privacy and collective transparency, between tradition and adaptation, between efficiency and participation, do not have optimal resolutions. They require ongoing negotiation by the humans who bear the consequences.

An AI system that resolves such tensions by defaulting to its training distribution is not governing — it is imposing a particular resolution without authority. The Tractatus framework addresses this by identifying value-laden decision points and requiring human adjudication rather than AI resolution.

Ostrom and polycentric governance. Elinor Ostrom's work on the governance of common-pool resources provides a framework for understanding how small-scale communities can govern shared resources effectively without centralised authority. Several of Ostrom's design principles — clearly defined boundaries, collective-choice arrangements, monitoring, graduated sanctions, conflict-resolution mechanisms — are directly applicable to AI governance at community scale.

The Tractatus framework explicitly adopts a polycentric model: governance authority is distributed across multiple independent mechanisms (the Guardian Agents described in the previous article), none of which has unilateral authority, and each of which monitors the others. This is structurally analogous to Ostrom's observation that effective commons governance requires multiple, overlapping enforcement mechanisms rather than a single centralised authority.

The Tractatus Framework: Architectural Governance

The Tractatus framework proposes four structural governance mechanisms that operate independently of the AI system they govern:

Boundary enforcement. A classification layer that evaluates incoming queries and identifies those that involve value judgments, ethical trade-offs, or contextual sensitivity beyond the formalisable domain. Such queries are not answered by the AI — they are routed to designated human decision-makers within the community. The boundary is defined by community-specific configuration, not by the AI model's assessment of its own competence.

Instruction persistence. Community-defined instructions — "always use this terminology," "never generate content on this topic," "route questions about this subject to the moderator" — are stored in a separate system that the AI model cannot access or modify. The model's outputs are checked against these stored instructions post-generation. Conflicts are resolved in favour of the stored instruction, regardless of the model's output distribution.

Cross-reference validation. The Guardian Agent verification layer described in the previous article — semantic grounding, claim decomposition, drift monitoring, and adaptive feedback. These mechanisms are structurally independent of the AI model and use different computational methods (embedding similarity, not generative prediction) to evaluate outputs.

Context pressure monitoring. A meta-governance layer that monitors the operating conditions under which the AI is functioning — query complexity, novelty relative to training distribution, system load — and adjusts verification intensity accordingly. Under high-pressure conditions (novel queries, edge cases, complex multi-part requests), verification thresholds are tightened. This addresses the observation that AI systems are most likely to fail under conditions where their outputs are most consequential.

What the Framework Does Not Claim

It is important to state explicitly what the Tractatus framework does not claim, as the temptation to overstate the contribution is a recognised failure mode in governance research.

It does not claim to solve the alignment problem. The framework governs AI outputs post-generation. It does not address the deeper question of whether an AI system's internal representations can be aligned with human values. The framework operates on the assumption that alignment is not achievable with current technology and that external governance is therefore necessary — but this assumption may itself be wrong, and a breakthrough in alignment research could render the framework's approach less relevant.

It does not claim to eliminate distributional bias. The framework mitigates the effects of distributional bias through verification and boundary enforcement. It does not eliminate the bias from the model. Under conditions where the verification layers fail (novel domains, sparse community records, adversarial inputs), distributional bias will reassert itself.

It does not claim universal applicability. The framework is designed for community-scale deployment — organisations with tens to hundreds of members, authenticated access, and identifiable moderators. Whether it scales to larger organisations, anonymous-access contexts, or communities without stable governance structures is untested.

It does not claim empirical validation at scale. The framework is implemented and operational, but the deployment base is small. Claims about effectiveness are based on architectural analysis and limited operational data, not on controlled studies or longitudinal research. The authors consider this a significant limitation.

It does not claim to address existential AI risk. The framework governs current-generation AI systems in specific deployment contexts. It does not address speculative risks associated with artificial general intelligence or superintelligence, which require fundamentally different governance approaches.

Open Research Questions

The Tractatus framework raises several questions that the authors consider open and worthy of investigation:

Boundary calibration. How should the boundary between formalisable and non-formalisable queries be determined? The current implementation uses community-specific configuration, but the criteria for drawing the boundary are not formalised. Is a generalisable methodology for boundary determination possible?
Verification adequacy. Under what conditions do the Guardian Agent verification mechanisms fail? What is the false-negative rate for value-level drift detection? Can adversarial inputs systematically evade the verification layers?
Feedback loop dynamics. Does the adaptive feedback mechanism converge on community preferences over time, or does it introduce systematic biases? Under what conditions does the feedback signal degrade?
Cross-community generalisability. Does the architecture produce comparable governance outcomes across different community types (religious, environmental, commercial, educational)? What community characteristics predict success or failure?
Scalability boundaries. At what community size does the polycentric governance model break down? Is there a threshold beyond which centralised governance becomes more effective?
Longitudinal stability. Do governance properties degrade over time as the community's content corpus evolves and the model is retrained? Is there a governance equivalent of model drift?

These questions are not rhetorical. They define a research agenda that the authors consider necessary for evaluating the framework's contribution. The framework's value as a research contribution depends on the willingness to subject it to empirical scrutiny, and the authors actively invite such scrutiny.

This is Article 3 of 5 in the "Community-Scale AI Governance" series. For the full governance architecture, visit Village AI on Agentic Governance. The Tractatus framework source code is available under EUPL-1.2 at agenticgovernance.digital.

Previous: Platform AI vs. Community-Governed AI — A Structural Analysis Next: A Production System Under Examination — What Is Deployed Today