Methodology
How we collect, process, and present Comal County spending data. Understand our data pipeline and its limitations.
Important: Data Accuracy Limitations
The data presented on this portal is extracted from scanned PDF documents using Optical Character Recognition (OCR) technology. These source documents are often:
- Photocopied multiple times, reducing image quality
- Scanned at varying resolutions and angles
- Subject to ink smudges, fading, or alignment issues
- Formatted inconsistently across different time periods
As a result, some data may contain OCR errors. Always verify critical information against the original PDF documents linked on each page.
Data Pipeline Overview
Data flows from the official Comal County website through our automated pipeline, ultimately reaching this portal where you can explore it interactively.
How It Works
Document Discovery & Download
Every week, the Comal County Auditor's Office publishes claims reports as PDF documents on the county website. Our automated scraper monitors for new documents and downloads them for processing.
Source: comal.tx.publicsearch.us
Documents include: Regular Claims, EFT Payments, Check Registers
AI Vision Parsing with Claude
The PDF documents are scanned images, not digital text. We use Anthropic's Claude AI with vision capabilities to read and extract structured data from each page. Claude can understand the tabular format of claims documents and extract vendor names, amounts, fund codes, and more.
Powered by Claude AI
Claude's vision model analyzes each PDF page as an image, understanding the layout and extracting data more accurately than traditional OCR. However, poor scan quality can still cause errors.
Why errors still occur: Even advanced AI vision can struggle with documents that are photocopied multiple times, have low contrast, or contain smudged/faded text. Characters like "0" vs "O" or "$1,234" vs "$1.234" can be misread.
Data Parsing & Validation
Our custom parser extracts structured data from the OCR output: vendor names, amounts, fund codes, department codes, and dates. We run validation checks comparing our parsed totals against the cover page totals printed on each document.
Parsed total matches document total (99.9%+ accuracy)
Small discrepancy, some claims may have OCR errors
Database Storage
Validated data is stored in a PostgreSQL database hosted on Supabase. The database maintains relationships between documents, claims, vendors, departments, funds, and accounts.
Database Schema:
Web Portal Presentation
This website queries the database in real-time to present spending data in an accessible, searchable format. You can explore by vendor, department, fund, category, or browse individual documents.
Data Sources
All data is sourced from official Comal County government websites. We monitor the following URLs for new documents:
https://comal.tx.publicsearch.us/
Primary source for claims and payment documents
https://agenda.comal.tx.us/
Commissioners Court agendas and attachments
Database Coverage
Time Period
2020 – Present
Our database contains claims and payment records starting from January 2020 through the most recent weekly update.
Documents Processed
290+ PDFs
Weekly claims reports, typically 10-50 pages each, containing itemized payment records.
Total Records
100,000+ Claims
Individual payment transactions extracted from claims documents.
Total Spending Tracked
$300M+
Cumulative county spending across all tracked time periods.
Update Frequency
Weekly updates — New claims documents are typically published by the County Auditor each week. Our system checks for new documents and processes them automatically.
Last document date shown on the Documents page reflects the most recent data available.
Known Limitations
OCR Accuracy
Dollar amounts, vendor names, and codes may contain errors from poor scan quality.
Vendor Name Variations
The same vendor may appear under slightly different names due to OCR errors or data entry variations in source documents.
Historical Data Gaps
Some older documents may not be available online or may have parsing issues due to different formats.
Not Official Records
This portal is an independent project and is not affiliated with Comal County government. For official records, contact the County Auditor's Office directly.
How to Verify Data
Every data point on this portal can be traced back to its source document. When viewing claims, departments, or vendors, look for the "View PDF" link to access the original document.
If you find an error, the original PDF is always the authoritative source. We encourage users to verify important information against the source documents.