🤖 AI Research Edition Article 6 of 7

Open high-country tussock under a wide sky

Beyond the Model — Platform Architecture and Governance Integration

The Model Is Not the System

The preceding articles examined the generation model, the Guardian Agent verification pipeline, and the distributional bias problem. This final article examines how architectural governance extends beyond the model into the platform, and evaluates the overall approach against what it sacrifices and what it gains. (Series-specific terminology is defined in the glossary.)

The central claim is that AI alignment at community scale cannot be solved by the model alone — not by training, not by fine-tuning, not by RLHF, and not by inference-time verification in isolation. Alignment at deployment requires architectural constraints that span the entire system: data isolation, consent architecture, vocabulary framing, human oversight integration, and federated governance. The model is one component. The architecture is the intervention.

Data Isolation as an Alignment Mechanism

Multi-tenant data isolation is typically discussed as a security concern. In the Village architecture, it also functions as an alignment mechanism.

Every database query is filtered by tenantId. The vector store maintains tenant-scoped collections. The generation model receives context only from the querying tenant's corpus. These are standard multi-tenancy patterns, but they have an alignment consequence: the model cannot draw on distributional patterns from other tenants' data.

This matters because alignment in community contexts is not universal. What constitutes appropriate language for an Episcopal parish may be inappropriate for a conservation group, and vice versa. A model that has access to all tenants' data — even read-only, even for retrieval — would develop distributional priors that blend across communities. Tenant isolation prevents this cross-contamination at the data layer.

The architectural principle is: the model's context window should contain only content from the community it is currently serving. This is enforced structurally, not by instruction. The model does not need to be told to stay within the tenant's boundary; it has no access to anything outside it.

Consent Architecture

The consent system (ConsentRecord model, AIMemoryConsent component) governs what content enters the AI pipeline. Three distinct consent purposes are defined: ai_triage_memory, ai_ocr_memory, and ai_summarisation_memory. Content is not indexed for AI use unless the content creator has granted explicit consent for the relevant purpose.

This is an alignment constraint that operates before inference. Content that has not been consented for AI use does not appear in the vector store, is not retrieved during RAG, and is not available as reference material for Guardian Agent verification. The model cannot hallucinate based on content it has never seen.

The consent architecture also addresses a subtler problem: community members who are uncomfortable with AI processing their contributions can exclude their content without affecting the system's ability to serve other members. This is a governance mechanism as much as a privacy mechanism — it allows the community to shape the AI's knowledge base through individual consent decisions.

Limitation: Consent operates at the content level, not at the information level. If member A writes a story mentioning member B, and member A consents to AI processing, information about member B enters the AI pipeline regardless of member B's preferences. This is an inherent limitation of content-level consent that we have not fully resolved.

Vocabulary as Framing Governance

Article 4 described the vocabulary system's interface and model-level effects. Here we examine it as a governance mechanism.

The vocabulary system implements what might be called framing governance: it constrains the conceptual frame within which the model operates. When the system substitutes "parishioners" for "users" and "vestry governance" for "admin settings" throughout the prompt context, it shifts the model's conditional distribution away from technology-platform patterns and towards community-governance patterns.

This is a weaker intervention than fine-tuning — it operates at the prompt level, not the weight level — but it has two advantages:

It is transparent and auditable. The vocabulary mappings are defined in a single configuration file (product-vocabularies.js). A researcher can inspect exactly which terms are substituted and predict their effect on model behaviour.
It is community-configurable. Different product types have different vocabulary mappings, and these can be extended without retraining the model. This is relevant for communities whose terminology does not fit any existing product type.

The interaction between vocabulary framing and the Specialised Layer fine-tuning is worth noting. The vocabulary system shifts the prompt context; the fine-tuning shifts the model's distributional priors. When both operate together — the prompt uses Episcopal vocabulary and the model has Episcopal fine-tuning — the combined effect is stronger than either intervention alone. When only one operates (a community type without a specialised model, using only vocabulary framing), the effect is weaker but still measurable in output quality.

Human Oversight Integration

The boundary enforcer (described in Article 3 of the parish series as a governance component) routes questions involving values, ethics, or cultural context to human review. This is implemented through the PreInferenceProtector and through confidence-based routing: when Guardian Agent verification produces confidence below a configurable threshold, the response is flagged for moderator review rather than delivered directly.

This creates a human-in-the-loop architecture where the AI handles high-confidence, well-grounded queries autonomously and escalates uncertain or sensitive queries to human oversight. The threshold is configurable per tenant, allowing communities to set their own risk tolerance.

Limitation: The quality of human oversight depends on the quality of the human moderators. The system can route uncertain queries to a moderator, but it cannot ensure the moderator has the domain expertise to evaluate them effectively. This is an organisational constraint, not a technical one, but it bounds the effectiveness of the overall architecture.

The moderator accreditation path — structured training for community members taking on the moderator role — is designed to address this limitation but is being rolled out progressively.

Federation and Inter-Community Governance

The federation architecture allows distinct Village instances to establish bilateral connections — sharing selected content across community boundaries while maintaining data sovereignty. Both communities must consent to the connection, and either can withdraw at any time.

From an alignment perspective, federation introduces a controlled channel through which distributional patterns from one community can influence another. A federated content exchange between an Episcopal parish and a conservation group could, in principle, shift the receiving community's AI behaviour by introducing out-of-domain content into the vector store.

The federation architecture addresses this through selective sharing — only content explicitly marked for federation is shared — and through tenant-scoped verification. Guardian Agent verification operates on the receiving community's corpus, which includes federated content only after it has been accepted and indexed. The receiving community's moderators control what federated content enters their AI pipeline.

This is a governance mechanism that has no analogue in the alignment literature, because the alignment literature does not typically consider multi-community deployment as a first-class concern. We note it as an area where deployed community AI systems face alignment challenges that laboratory settings do not capture.

What This Approach Sacrifices

We enumerate the costs of this approach clearly:

Raw capability. A 14B-parameter model cannot match frontier systems on general tasks. Users who need creative writing, complex reasoning across unfamiliar domains, or broad-spectrum intellectual assistance will find this system inadequate.

Latency. The Guardian Agent pipeline adds verification overhead to every response. The four-layer pipeline, including embedding computation, cosine similarity search, claim decomposition, and anomaly checking, introduces measurable latency. For communities that prioritise response speed over verification rigour, this is a cost.

Coverage. The system's domain fidelity depends on the quality and coverage of the fine-tuning data and the community's content corpus. A newly established community with minimal content provides a sparse reference corpus, making Guardian Agent verification less effective and model behaviour less grounded.

Scalability. The architecture is designed for community-scale deployment (tens to low hundreds of concurrent users per tenant). It has not been tested at internet scale, and the per-response verification pipeline would likely require substantial architectural changes to operate at high throughput.

Generalisability. The Specialised Layer strategy has been deployed for several product types (community, whanau, episcopal, family, business) but validated for none — deployment is not efficacy evidence. Whether it generalises to all nine defined product types, and whether the Guardian Agent thresholds require per-domain calibration, is unproven.

What This Approach Gains

Verifiability. Every AI response can be traced to specific source documents. The cosine similarity scores, the claim-level verification results, and the confidence indicators are available for inspection. This is a property that frontier systems operating on unbounded training corpora cannot offer.

Bounded agency. Because the platform — not the open web — is the action surface, the system's agentic capacity is bounded by construction: it acts only within tenant-isolated data, on reversible operations, with consequential actions gated to human authorisation. The agentic turn across the wider field (Article 1) makes this property more salient, not less: where general-purpose agents expand the action space to the internet, the platform-bounded design contracts it to an inspectable, recoverable surface. This is a governance gain that is only legible at the platform level, not the model level — which is the broader point of this article.

Auditability. The fine-tuning data, the vocabulary mappings, the Guardian Agent thresholds, and the feedback loop corrections are all inspectable. A researcher or auditor can examine the full chain from input to output and understand why the system produced a specific response. The Tractatus framework is published as open source; the governance architecture is open to external review.

Community sovereignty. The community controls the data, the inference infrastructure, the vocabulary framing, the consent boundaries, and the moderation policy. No third-party provider can change the system's behaviour without the community's consent. This is a governance property, not a technical one, but it is architecturally enforced.

Epistemic separation. The verification system operates on different principles from the generation system. This does not prove correctness, but it provides a detection mechanism for the specific failure mode — silent distributional reversion — that motivated the architecture. The 27027 incident would be caught by the Guardian Agent pipeline, because the cosine similarity between therapeutic bereavement language and the community's theological corpus would fall below the verification threshold.

Falsifiability. The system makes specific, testable claims: that Guardian Agent verification reduces ungrounded responses, that domain specialisation improves register fidelity, that vocabulary framing shifts model behaviour measurably. These claims are, in principle, independently testable. We have not yet arranged independent testing, but the architecture does not resist it.

Open Questions for the Research Community

We conclude with questions we cannot answer ourselves and would welcome engagement on:

Is epistemic separation sufficient for alignment, or merely necessary? The Guardian Agent architecture provides detection of distributional reversion. Detection is not prevention. Is there a theoretical basis for arguing that detection-and-correction converges to alignment, or does it merely bound the frequency of failures?
How should cosine similarity thresholds be calibrated? The current thresholds are empirically tuned. Is there a principled method for setting verification thresholds that balances false positive rate (flagging grounded responses as ungrounded) against false negative rate (passing ungrounded responses)?
Does the correlated embedding vulnerability have practical mitigations? The shared embedding model used for both retrieval and verification creates a single point of failure. What architectures might provide genuinely independent verification while remaining computationally tractable?
Can the Specialised Layer strategy be formalised? The intuition — domain-specific fine-tuning on a smaller model yields better domain fidelity than prompting a larger model — is empirically supported in our deployment but has not been rigorously compared. Under what conditions does this hold, and when does it break down?
What evaluation frameworks apply to community-scale alignment? Standard alignment benchmarks evaluate general-purpose safety properties. What benchmarks would be appropriate for evaluating domain-specific alignment — fidelity to a specific community's norms, vocabulary, and values?

These questions are beyond the scope of a single deployment team. We raise them because the alignment problem at community scale — prosaic, operationally consequential, and largely ignored by the research community — deserves more attention than it currently receives.

Platform and founding-community programme: My Digital Sovereignty. Full AI architecture: Village AI on Agentic Governance. Practitioner courses on operating these systems under human control: Working with Claude and Agents at Work.

Useful? Share this article, or show a QR code to scan.