⚠️
This demo contains entirely synthetic constituent data. No real names, addresses, phone numbers, or casework details. The synthetic data portrays a fictitious Maryland Senate office ("Senator John Doe") with realistic volume, geographic spread, and issue distribution.
Congressional bill data is real and public, ingested from Congress.gov.
Reference Architecture Demonstration

From Volume to Voice

AI Infrastructure for a U.S. Senate Office — built, demonstrated, deployable.
Launch Demo → 📄 Briefing PDF or read on for context

A large-state Senate office is not a communications operation — it is a signal-processing organization that also does communications. Tens of thousands of constituent contacts per week. Dozens of bills per legislative day. Press inquiries, casework, social mentions, meeting requests arriving faster than any 60-person staff can absorb. Most of this data is read once, then disappears.

This demo shows what changes when the office runs its own AI infrastructure — on its own hardware, inside the Senate enclave, with constituent PII that never leaves the building.

What's in this demo

69,398
Real Congressional bills
50,565
Bills with summaries indexed
60,000
Synthetic casework cases
145,520
Synthetic phone-call logs
107,857
Generated PDF records
158,120
bge-m3 vector chunks

What you'll be able to do

How the data gets here

Two data sources, two different stories:

Bills — real, from Congress.gov

Initial corpus ingested directly from the Congress.gov public data API. Bill text, sponsors, cosponsors, action history, policy areas, and committee assignments. Each bill summary is chunked, embedded with the bge-m3 model (1024-dim vectors), and stored in pgvector with HNSW index for semantic search.

Daily auto-refresh planned. Currently the corpus represents Congresses 117 and 118 as of the initial ingest.

Constituent data — entirely synthetic

Generated to reflect realistic volume, MD geographic spread, and casework complexity for a large Maryland office. Every name, address, phone number, and case detail is fictitious. Each generated document (outcome letter, incoming letter, email, phone screen, walk-in form, web-form submission) is rendered as a real PDF and stored on disk — the same way scanned office records would exist in a real deployment.

Generated 2026-05-10 / 11. Watermarked "MOCK DATA — FOR TESTING ONLY" on every PDF.

The hardware behind it

A commodity Linux file server (PostgreSQL, pgvector, FastAPI, the ingest pipelines, the React UI) plus an NVIDIA GB10 Superchip with 128GB unified memory for AI inference. Three local models: bge-m3 for embeddings, Qwen 2.5 14B for classification and NL→SQL, Llama 3.3 70B (planned) for drafting and long-form briefings. Total inference stack ~50GB; 78GB headroom.

Open-source throughout. No cloud, no vendor lock-in, no per-seat licensing.

The Learning Engine

NewContinuous self-training from public data

A separate microservice that watches public, free data sources every two hours — Congress.gov bill actions and floor votes, govinfo committee hearings, news headlines, Bluesky social posts — and turns each new observation into candidate question-and-SQL training pairs. Validated pairs flow automatically into the search index, so the demo gets measurably smarter the longer it runs without anyone in the loop.

Five data sources active. Polls autonomously every 2 hours via systemd timer. Pairs are auto-validated against live data and only added if the SQL returns sensible results.

Ready to explore?
Demo is read-only. Clicking, querying, and drilling in are all safe.
📄 Download Briefing (PDF) Launch Demo →