Comal County

Methodology

How we collect, process, and present Comal County spending data. Understand our data pipeline and its limitations.

Important: Data Accuracy Limitations

The data presented on this portal is extracted from scanned PDF documents using Optical Character Recognition (OCR) technology. These source documents are often:

  • Photocopied multiple times, reducing image quality
  • Scanned at varying resolutions and angles
  • Subject to ink smudges, fading, or alignment issues
  • Formatted inconsistently across different time periods

As a result, some data may contain OCR errors. Always verify critical information against the original PDF documents linked on each page.

Data Pipeline Overview

County WebsitePDF Documents
ScraperDownloads PDFs
Claude AIVision Parsing
SupabasePostgreSQL DB
This PortalYou are here!

Data flows from the official Comal County website through our automated pipeline, ultimately reaching this portal where you can explore it interactively.

How It Works

1

Document Discovery & Download

Every week, the Comal County Auditor's Office publishes claims reports as PDF documents on the county website. Our automated scraper monitors for new documents and downloads them for processing.

Source: comal.tx.publicsearch.us

Documents include: Regular Claims, EFT Payments, Check Registers

2

AI Vision Parsing with Claude

The PDF documents are scanned images, not digital text. We use Anthropic's Claude AI with vision capabilities to read and extract structured data from each page. Claude can understand the tabular format of claims documents and extract vendor names, amounts, fund codes, and more.

Powered by Claude AI

Claude's vision model analyzes each PDF page as an image, understanding the layout and extracting data more accurately than traditional OCR. However, poor scan quality can still cause errors.

Why errors still occur: Even advanced AI vision can struggle with documents that are photocopied multiple times, have low contrast, or contain smudged/faded text. Characters like "0" vs "O" or "$1,234" vs "$1.234" can be misread.

3

Data Parsing & Validation

Our custom parser extracts structured data from the OCR output: vendor names, amounts, fund codes, department codes, and dates. We run validation checks comparing our parsed totals against the cover page totals printed on each document.

Validated

Parsed total matches document total (99.9%+ accuracy)

Partial Match

Small discrepancy, some claims may have OCR errors

4

Database Storage

Validated data is stored in a PostgreSQL database hosted on Supabase. The database maintains relationships between documents, claims, vendors, departments, funds, and accounts.

Database Schema:

📄 documents
💰 claims
🏢 vendors
🏛️ departments
💵 funds
🏷️ accounts
5

Web Portal Presentation

This website queries the database in real-time to present spending data in an accessible, searchable format. You can explore by vendor, department, fund, category, or browse individual documents.

Next.js 14ReactTailwind CSSVercel

Data Sources

All data is sourced from official Comal County government websites. We monitor the following URLs for new documents:

https://comal.tx.publicsearch.us/

Primary source for claims and payment documents

https://agenda.comal.tx.us/

Commissioners Court agendas and attachments

Database Coverage

Time Period

2020 – Present

Our database contains claims and payment records starting from January 2020 through the most recent weekly update.

Documents Processed

290+ PDFs

Weekly claims reports, typically 10-50 pages each, containing itemized payment records.

Total Records

100,000+ Claims

Individual payment transactions extracted from claims documents.

Total Spending Tracked

$300M+

Cumulative county spending across all tracked time periods.

Update Frequency

Weekly updates — New claims documents are typically published by the County Auditor each week. Our system checks for new documents and processes them automatically.

Last document date shown on the Documents page reflects the most recent data available.

Known Limitations

OCR Accuracy

Dollar amounts, vendor names, and codes may contain errors from poor scan quality.

Vendor Name Variations

The same vendor may appear under slightly different names due to OCR errors or data entry variations in source documents.

Historical Data Gaps

Some older documents may not be available online or may have parsing issues due to different formats.

Not Official Records

This portal is an independent project and is not affiliated with Comal County government. For official records, contact the County Auditor's Office directly.

How to Verify Data

Every data point on this portal can be traced back to its source document. When viewing claims, departments, or vendors, look for the "View PDF" link to access the original document.

If you find an error, the original PDF is always the authoritative source. We encourage users to verify important information against the source documents.