🤖 AI Research Edition

Whats Live Today

English

What Is Live in Production — An Unvarnished Inventory


Series: Architectural AI Governance at Community Scale — A Technical Examination of Village AI (Article 4 of 5) Author: My Digital Sovereignty Ltd Date: March 2026 Licence: CC BY 4.0 International


Scope

This article describes the system as it exists in production as of March 2026. Where a capability is planned but not yet deployed, we say so. Where a capability is deployed but has known limitations, we describe those limitations. The goal is an inventory that a researcher could use to assess the system's maturity and the claims made elsewhere in this series.

Village AI has been in production since October 2025. It is a young system operating at modest scale. The following is a technical description of the deployed architecture.

Model Architecture

Base model: villageai-8b-corrected-v4 — an 8B parameter model serving as the foundation layer for all tenants. Trained on platform operational content: feature documentation, navigation patterns, help query patterns, and community interaction conventions.

Specialised layers: Per-product-type fine-tuned models deployed via a routing layer (model-routing.js). The first production specialisation is villageai-8b-episcopal-v2, fine-tuned on Episcopal/Anglican liturgical, pastoral, and governance content. Additional specialisations (family, conservation, community) are planned but not yet trained.

Model routing: An InferenceRouter selects the appropriate model based on the tenant's product type. If a specialisation exists for the tenant's product type, it is used; otherwise, the base model serves the request. An enhanced tier is defined in the architecture but not yet populated.

Inference hardware: Primary inference runs on an AMD RX 7900 XTX GPU, accessed via WireGuard VPN from the application server (OVH France). CPU fallback is available using a 3B parameter degraded model for availability during GPU outages. The GPU is not co-located with the application server — inference requests traverse a VPN tunnel, adding network latency to the inference path.

Framework: Inference is managed through Ollama, with the InferenceRouter handling model selection and request routing. The system does not use any third-party inference API; all generation occurs on controlled infrastructure.

Retrieval-Augmented Generation

Vector store: Qdrant, storing embeddings of community content (stories, announcements, documents, event descriptions, governance records).

Embedding pipeline: The EmbeddingService processes community content into vector representations. Content is chunked, embedded, and indexed per tenant, maintaining strict tenant isolation at the vector store level.

Retrieval at inference time: User queries are embedded and used for cosine similarity search against the tenant's document corpus. Retrieved documents are provided as context to the generation model, grounding its responses in the community's actual content.

Content indexing: A ContentIndexer service processes new and updated content into the vector store. Indexing respects consent boundaries — content not explicitly shared for AI use is not indexed.

Guardian Agent Pipeline

Every AI response passes through four Guardian Agent layers before reaching the user. The pipeline is implemented in src/services/guardians/ and is structurally independent of the generation model.

Layer 1: AccuracyVerifier

Layer 2: HallucinationDetector

Layer 3: AnomalyDetector + PressureMonitor

Layer 4: ResponseReviewer + RegressionMonitor + Adaptive Feedback

Pre-Inference Protection

A PreInferenceProtector operates before generation, screening inputs for injection patterns and routing certain query types directly to human review. This is a conservative filter — it errs on the side of blocking — and is separate from the post-generation Guardian pipeline.

What the System Can Do Today

Community-grounded question answering. Given a query about community content ("When is the next vestry meeting?", "What did the rector say about the building fund?"), the system retrieves relevant documents and generates a response grounded in that content. If no relevant documents are found, the system indicates this rather than generating from the base model's priors.

Drafting assistance. The system can generate draft bulletins, announcements, and correspondence that reflect the community's tone and vocabulary. All drafts are reviewed by a moderator before publication.

Document summarisation. Long documents (vestry minutes, policy documents) can be summarised with key points extracted.

Translation support. The platform supports five languages: English, German, French, Dutch, and te reo Maori. Translation uses DeepL (not the generation model) for accuracy.

Feedback triage. Member feedback is automatically classified, investigated where possible, and routed to the appropriate moderator. The HelpFeedbackSweepService and GeneralFeedbackProcessor handle automated investigation and resolution.

OCR and document processing. The DocumentExtractor service processes scanned documents, making their content searchable and available for RAG retrieval.

Vocabulary System

The vocabulary system (product-vocabularies.js, vocabulary.js) adapts the platform's terminology to the community type. This operates at two levels:

Interface level: UI labels, navigation terms, and feature names are replaced with domain-appropriate vocabulary. An Episcopal parish sees "parishioners," "vestry governance," and "parish bulletins" rather than generic platform terminology.

Model level: The vocabulary shapes the context provided to the model. When the system refers to "parishioners" rather than "users" in the prompt context, the model's output reflects that framing. This is a lightweight intervention — it operates at the prompt level, not the weight level — but it reduces the friction between the model's distributional priors and the community's terminology.

Nine product types are defined: community, family, conservation, diaspora, clubs, business, alumni, whanau, and episcopal. Each has a distinct vocabulary mapping.

What Is Not Yet Proven

We enumerate specific claims that have not been validated:

Guardian Agent efficacy under adversarial conditions. The system has not been subjected to systematic red-teaming. Guardian Agent performance under adversarial prompting, deliberate attempts to elicit hallucination, or coordinated injection attacks is unknown.

Specialised layer generalisation. The Episcopal specialisation (villageai-8b-episcopal-v2) has been deployed for one product type. Whether the Specialised Layer strategy generalises effectively to other domains (conservation ecology, te reo Maori cultural contexts, family genealogy) has not been empirically demonstrated.

Cosine similarity threshold calibration. The similarity thresholds used by the AccuracyVerifier were set based on development testing and early production experience. They have not been optimised through systematic evaluation against a labelled dataset of grounded and ungrounded responses.

Long-term distributional stability. The system has been in production for approximately five months. Whether the base model's priors reassert themselves over time — a slow drift back towards training distribution despite fine-tuning — has not been observed over a sufficient time horizon to draw conclusions.

Cross-lingual verification. For communities operating in languages other than English, the Guardian Agent pipeline operates on embeddings of the non-English text. Whether cosine similarity verification is equally effective across languages has not been systematically evaluated.

Feedback loop convergence. The adaptive feedback mechanism (Layer 4) is designed to improve system behaviour over time. Whether it converges to stable, improved performance or exhibits oscillatory or divergent behaviour under certain feedback patterns has not been formally analysed.

We present these not as deferrals but as open questions. The system is operational; these questions are unanswered.

Infrastructure

All inference occurs within the operator's infrastructure. No prompts, responses, or community content are transmitted to third-party AI providers.


This is Article 4 of 5 in the "Architectural AI Governance at Community Scale" series. For the full technical architecture, visit Village AI on Agentic Governance.

Previous: Why Training-Time Governance Fails — Architectural Constraints as an Alternative Next: Beyond the Model — Platform Architecture and Governance Integration

Published under CC BY 4.0 by My Digital Sovereignty Ltd. You are free to share and adapt this material, provided you give appropriate credit.