A large-state Senate office is not a communications operation — it is a signal-processing organization that also does communications. Tens of thousands of constituent contacts per week. Dozens of bills per legislative day. Press inquiries, casework, social mentions, meeting requests arriving faster than any 60-person staff can absorb. Most of this data is read once, then disappears.
This demo shows what changes when the office runs its own AI infrastructure — on its own hardware, inside the Senate enclave, with constituent PII that never leaves the building.
Two data sources, two different stories:
Initial corpus ingested directly from the Congress.gov public data API. Bill text, sponsors, cosponsors, action history, policy areas, and committee assignments. Each bill summary is chunked, embedded with the bge-m3 model (1024-dim vectors), and stored in pgvector with HNSW index for semantic search.
Daily auto-refresh planned. Currently the corpus represents Congresses 117 and 118 as of the initial ingest.
Generated to reflect realistic volume, MD geographic spread, and casework complexity for a large Maryland office. Every name, address, phone number, and case detail is fictitious. Each generated document (outcome letter, incoming letter, email, phone screen, walk-in form, web-form submission) is rendered as a real PDF and stored on disk — the same way scanned office records would exist in a real deployment.
Generated 2026-05-10 / 11. Watermarked "MOCK DATA — FOR TESTING ONLY" on every PDF.
A commodity Linux file server (PostgreSQL, pgvector, FastAPI, the ingest pipelines, the React UI) plus an NVIDIA GB10 Superchip with 128GB unified memory for AI inference. Three local models: bge-m3 for embeddings, Qwen 2.5 14B for classification and NL→SQL, Llama 3.3 70B (planned) for drafting and long-form briefings. Total inference stack ~50GB; 78GB headroom.
Open-source throughout. No cloud, no vendor lock-in, no per-seat licensing.
A separate microservice that watches public, free data sources every two hours — Congress.gov bill actions and floor votes, govinfo committee hearings, news headlines, Bluesky social posts — and turns each new observation into candidate question-and-SQL training pairs. Validated pairs flow automatically into the search index, so the demo gets measurably smarter the longer it runs without anyone in the loop.
Five data sources active. Polls autonomously every 2 hours via systemd timer. Pairs are auto-validated against live data and only added if the SQL returns sensible results.