Florida has 67 counties. Each county submits an annual property tax ratio study (Report FF09) to the Department of Revenue. Each report is a complex document — tables, statistical ratios, compliance thresholds, confidence intervals, assessor notes.
Every year, DOR analysts must:
- Open each document manually
- Find the COD (Coefficient of Dispersion) and PRD (Price-Related Differential) values
- Check them against IAAO standards (different thresholds per property type)
- Write a compliance determination
- Generate an executive summary memo for county officials
For 67 counties × 10 years of historical data = 670 documents. All manual.
This pipeline automates the entire workflow using Snowflake Cortex AI — and the data never leaves Snowflake's security boundary.
MODULE 00: Environment Check
SET ROLE · SET WAREHOUSE · Verify Cortex accessible
↓
MODULE 01: Session Schema & Stage Setup
Unique session ID → SESSION_XXXXXXXX schema
Encrypted internal stage (SNOWFLAKE_SSE)
Copy documents from shared stage
↓
MODULE 02: Parse Document — OCR Mode
SNOWFLAKE.CORTEX.PARSE_DOCUMENT → plain text
Good for: simple extraction
Limitation: no table structure preserved
↓
MODULE 03: Parse Document — Layout Mode
SNOWFLAKE.CORTEX.PARSE_DOCUMENT → Markdown
Tables preserved with | column | separators |
Best choice for AI downstream prompts
↓
MODULE 04: Parse Document — Page Split
page_split: TRUE → one row per page
Enables parallel processing at scale
67 counties × 10 years = parallelized
↓
MODULE 05: Store & Catalog
CREATE TABLE parsed_documents
county · report_year · text_content · page_count
CLUSTER BY (county, report_year)
↓
MODULE 06: AI Extract — Structured Metrics
CORTEX.COMPLETE → JSON extraction
county · total_parcels · overall_ratio · strata data
↓
MODULE 07: AI Analysis — Compliance Check
CORTEX.COMPLETE → IAAO standards check
COD thresholds · PRD bounds · Risk level
↓
MODULE 08: AI Classification
AI_CLASSIFY → property focus type
Residential · Commercial · Agricultural · Mixed
↓
MODULE 09: AI Executive Summary
CORTEX.COMPLETE → government memo format
Overview · Key Findings · Concerns · Recommendations
↓
MODULE 10: Enrichment Pipeline
All AI combined into enriched_ratio_study table
compliance_status · risk_level · key_metrics_json
↓
MODULE 11: Cleanup
DROP SCHEMA → removes all session data
-- LAYOUT mode preserves table structure for AI
SELECT
SNOWFLAKE.CORTEX.PARSE_DOCUMENT(
'@document_stage',
'county_ratio_study.docx',
{'mode': 'LAYOUT'}
):content::VARCHAR AS document_markdown;-- Claude Sonnet checks against IAAO standards
SELECT
county,
TRIM(SNOWFLAKE.CORTEX.COMPLETE(
'claude-4-sonnet',
CONCAT(
'Based on IAAO standards (COD < 15% residential, ',
'< 20% commercial; PRD between 0.98 and 1.03), ',
'is this county IN COMPLIANCE? ',
'Answer with EXACTLY one word: ',
'COMPLIANT, NEEDS_REVIEW, or NON_COMPLIANT.\n\n',
'Document:\n', LEFT(text_content, 4000)
)
)) AS compliance_status
FROM parsed_documents;-- Generates professional government memo automatically
SELECT
SNOWFLAKE.CORTEX.COMPLETE(
'claude-4-sonnet',
CONCAT(
'You are a senior analyst at the Florida . ',
'Generate an executive summary memo with sections: ',
'OVERVIEW · KEY FINDINGS · STRATUM PERFORMANCE · ',
'AREAS OF CONCERN · RECOMMENDATIONS\n\n',
'Document:\n', text_content
)
) AS executive_summary
FROM parsed_documents;CREATE OR REPLACE TABLE enriched_ratio_study AS
SELECT
county, report_year,
AI_CLASSIFY(text_content, [
'Residential Focus', 'Commercial Focus',
'Mixed Property Types', 'Agricultural Focus'
]):label::VARCHAR AS primary_property_focus,
CORTEX.COMPLETE('claude-4-sonnet', ...) AS key_metrics_json,
TRIM(CORTEX.COMPLETE(...)) AS compliance_status,
TRIM(CORTEX.COMPLETE(...)) AS risk_level
FROM parsed_documents;This is the most valuable part of this project — not just building it, but auditing it.
| Control | Implementation | Why It Matters |
|---|---|---|
| RBAC | USE ROLE DOR_LAB_ROLE |
Least-privilege access |
| Encryption at rest | SNOWFLAKE_SSE on stage |
Documents encrypted in storage |
| Data residency | Cortex runs inside Snowflake | Data never reaches Anthropic/Meta servers |
| Session isolation | Unique SESSION_XXXXXXXX schema |
Users cannot see each other's data |
| Gap | Risk Level | Fix Required |
|---|---|---|
| No PII masking | 🔴 HIGH | Dynamic data masking on text_content column |
| No network policy | 🔴 HIGH | Agency IP allowlist — restrict to VPN/network only |
| No MFA enforcement | 🔴 HIGH | Account-level MFA policy (FISMA requirement) |
| Incomplete audit logging | 🟡 MEDIUM | Access History + Object Tagging + SIEM integration |
| No AI model governance | 🟡 MEDIUM | Model cards + version logging + human review gate |
-- Add this before going to production:
CREATE MASKING POLICY mask_document_text AS
(val STRING) RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('DOR_ADMIN_ROLE') THEN val
ELSE '*** DOCUMENT RESTRICTED — INSUFFICIENT PRIVILEGES ***'
END;
ALTER TABLE parsed_documents
MODIFY COLUMN text_content
SET MASKING POLICY mask_document_text;The Hidden Risk Nobody Talks About — Time Travel
-- Module 11 runs: DROP SCHEMA IF EXISTS
-- But Snowflake Time Travel retains data for 90 days by default!
-- "Deleted" data is still recoverable by an ACCOUNTADMIN.
-- The correct production cleanup:
ALTER SCHEMA IDENTIFIER($lab_schema)
SET DATA_RETENTION_TIME_IN_DAYS = 0; -- disable Time Travel FIRST
DROP SCHEMA IF EXISTS IDENTIFIER($lab_schema); -- now truly deleted-- Current code: no model version recorded
-- Risk: same query run on Aug 1 vs July 3 may give different compliance
-- answer if Snowflake silently updates the Claude model version
-- Production fix: always log model + timestamp
ALTER TABLE enriched_ratio_study ADD COLUMN
model_version VARCHAR DEFAULT 'claude-4-sonnet-20250514';
ALTER TABLE enriched_ratio_study ADD COLUMN
inference_timestamp TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP();These are questions that stump even senior engineers:
-
Output validation gap:
TRIM(CORTEX.COMPLETE(...))expects exactly one word back. What if the model returns"The county appears to be COMPLIANT"instead of"COMPLIANT"? The enriched table stores a wrong label silently — no error, no alert. -
Context truncation risk:
LEFT(text_content, 4000)cuts the document at 4,000 characters. What if the critical COD/PRD values for Stratum 4 are on page 3 — past the cutoff? The compliance decision was made on incomplete data. -
Legal liability: If AI incorrectly labels a county as COMPLIANT when it should be NON_COMPLIANT, and DOR uses that to make a legal determination — who is liable? There is no human review gate in the current pipeline.
-
Model drift: The same query run today vs. in 3 months may give a different compliance result if Snowflake updates the Claude model. Government records must be reproducible. How do you version-control AI outputs?
| File | Description |
|---|---|
fl_dor_lab.sql |
Complete 11-module Snowflake SQL lab |
CURRENT (Lab):
1 county · 1 document · Manual trigger · No masking
NEXT (Pilot):
3 counties · Batch processing · Masked PII
Network policy · MFA enforced · Model version logging
FULL PRODUCTION:
67 counties × 10 years = 670 documents
Snowflake Task (automated trigger on new file upload)
Stream on parsed_documents → detect new inserts
Power BI / Tableau connected to enriched_ratio_study
Full audit trail · Human review gate · SIEM integrated
"AI inside your secure environment is fundamentally different from AI in the public cloud. But 'inside' doesn't mean 'safe' — it means you've solved one of five problems."
SNOWFLAKE_SSEencryption solves storage security — not access control, not audit, not AI governance- Data residency (Cortex running inside Snowflake) is the most important security property for government use
TRY_PARSE_JSONreturning NULL on AI output failure is silent — always validate AI outputs- Time Travel is a compliance liability if you DROP schemas containing confidential documents without setting
DATA_RETENTION_TIME_IN_DAYS = 0first - Model versioning isn't optional in government AI — it's a reproducibility and legal requirement
Author: Ashok Ankalla — Enterprise Data & AI Transformation Leader Analyzed and extended with production security design · 2026