Parse COI PDFs with a REST API: Dev Guide

Certificate of Insurance PDFs are structurally simple documents - they follow ACORD form templates, have predictable field positions, and carry a finite set of data points. Yet most engineering teams that try to parse them for the first time spend weeks on edge cases: scanned images, PDFs exported from multiple carriers with slightly different layouts, missing optional fields that should be null rather than absent, and coverage types that appear under different ACORD form variants.

COI ParseAPI handles all of that. You send a PDF, you get back a clean, consistent JSON payload every time. This guide walks through the full integration - from your first authenticated request to handling compliance logic, webhooks, and edge cases in production. By the end you'll have a working integration that accepts a COI PDF upload, parses it, applies your compliance requirements, and stores the results.

The guide uses Python and Node.js examples throughout. The API is language-agnostic - if you can send a multipart HTTP request, you can use it.

What the API Returns

Before writing a single line of integration code, it's worth understanding exactly what the API produces. The response schema is consistent regardless of carrier, form variant, or PDF generation method. Every response includes the form type, all named parties, every coverage line present on the certificate, compliance flags, and a calculated compliance score.

Here is a full example response for an ACORD 25 general liability certificate:

{
  "form_type": "ACORD_25",
  "parse_confidence": 0.97,
  "policyholder": {
    "name": "Apex Contractors LLC",
    "address": "123 Industrial Way, Chicago, IL 60601"
  },
  "certificate_holder": {
    "name": "Westside Properties LLC",
    "address": "456 Main St, Chicago, IL 60602"
  },
  "insurer": {
    "name": "Travelers Insurance",
    "naic_code": "25674"
  },
  "coverages": {
    "general_liability": {
      "each_occurrence": 1000000,
      "general_aggregate": 2000000,
      "products_aggregate": 2000000,
      "personal_advertising": 1000000,
      "policy_number": "GL-2024-789456",
      "effective_date": "2026-01-01",
      "expiration_date": "2027-01-01"
    },
    "auto_liability": {
      "combined_single_limit": 1000000,
      "policy_number": "CA-2024-112233",
      "effective_date": "2026-01-01",
      "expiration_date": "2027-01-01"
    },
    "umbrella": {
      "each_occurrence": 5000000,
      "aggregate": 5000000,
      "policy_number": "UMB-2024-334455",
      "effective_date": "2026-01-01",
      "expiration_date": "2027-01-01"
    },
    "workers_compensation": {
      "el_each_accident": 500000,
      "el_disease_policy_limit": 500000,
      "el_disease_each_employee": 500000,
      "policy_number": "WC-2024-556677",
      "effective_date": "2026-01-01",
      "expiration_date": "2027-01-01"
    }
  },
  "additional_insured": true,
  "waiver_of_subrogation": true,
  "compliance_score": 87,
  "flags": ["umbrella_sublimit_detected"],
  "raw_text_available": true,
  "parsed_at": "2026-03-24T14:22:10Z"
}

A few things worth noting. All monetary values are integers representing whole dollar amounts - no strings, no formatted numbers with commas. Dates are ISO 8601 strings (YYYY-MM-DD). Missing fields on the PDF are returned as null, not omitted from the response - this distinction matters when you're writing compliance checks. The parse_confidence field (0.0 to 1.0) reflects the OCR confidence for scanned documents; native digital PDFs typically score 0.95 or higher.

For ACORD 28 certificates (commercial property), the coverages object uses a different schema with property-specific fields. See our guide on ACORD 25 vs ACORD 28 differences for the complete breakdown.

Authentication and Rate Limits

All requests require a Bearer token in the Authorization header. You get your API key from the dashboard after signup - it looks like cpapi_live_sk_xxxxxxxxxxxxxxxx for production and cpapi_test_sk_xxxxxxxxxxxxxxxx for the sandbox environment.

The sandbox accepts any valid PDF and returns a mocked response without consuming parse credits. Use it for integration testing and CI pipelines.

Authorization: Bearer cpapi_live_sk_xxxxxxxxxxxxxxxx
Content-Type: multipart/form-data

Rate limits vary by plan tier:

Plan	Requests / Minute	Requests / Day	Max File Size
Starter	10	500	10 MB
Growth	60	5,000	25 MB
Scale	300	50,000	50 MB
Enterprise	Custom	Custom	100 MB

When you hit a rate limit, the API returns HTTP 429 with a Retry-After header containing the number of seconds to wait. Always implement exponential backoff with jitter for batch processing jobs. Your retry loop should respect the Retry-After value rather than using a fixed delay.

Uploading a PDF - The Basic Request

The core endpoint is POST /v1/parse. Send the PDF as multipart/form-data with the file in the document field. The response is synchronous for files under 5 MB on Growth and Scale plans - you get the parsed JSON in the same HTTP response.

Python (requests library)

import requests

API_KEY = "cpapi_live_sk_xxxxxxxxxxxxxxxx"
API_URL = "https://api.coiparseapi.com/v1/parse"

def parse_coi(pdf_path: str) -> dict:
    with open(pdf_path, "rb") as f:
        response = requests.post(
            API_URL,
            headers={"Authorization": f"Bearer {API_KEY}"},
            files={"document": (pdf_path, f, "application/pdf")},
            timeout=30
        )

    response.raise_for_status()
    return response.json()

# Usage
result = parse_coi("./certs/apex-contractors-coi.pdf")
print(f"Policyholder: {result['policyholder']['name']}")
print(f"GL Each Occurrence: ${result['coverages']['general_liability']['each_occurrence']:,}")
print(f"Compliance Score: {result['compliance_score']}")

Node.js (axios + form-data)

import axios from "axios";
import FormData from "form-data";
import fs from "fs";

const API_KEY = "cpapi_live_sk_xxxxxxxxxxxxxxxx";
const API_URL = "https://api.coiparseapi.com/v1/parse";

async function parseCOI(pdfPath) {
  const form = new FormData();
  form.append("document", fs.createReadStream(pdfPath), {
    filename: "certificate.pdf",
    contentType: "application/pdf",
  });

  const response = await axios.post(API_URL, form, {
    headers: {
      Authorization: `Bearer ${API_KEY}`,
      ...form.getHeaders(),
    },
    timeout: 30000,
  });

  return response.data;
}

// Usage
const result = await parseCOI("./certs/apex-contractors-coi.pdf");
console.log("Policyholder:", result.policyholder.name);
console.log("Expires:", result.coverages.general_liability?.expiration_date);

cURL

curl -X POST https://api.coiparseapi.com/v1/parse \
  -H "Authorization: Bearer cpapi_live_sk_xxxxxxxxxxxxxxxx" \
  -F "document=@./certs/apex-contractors-coi.pdf" \
  -o response.json

Adding a Compliance Requirements Template

Parsing a COI is only the first half of the job. The second half is checking whether it meets your specific requirements. Rather than writing that logic yourself, you can pass a requirements object in the request body and let the API calculate compliance for you.

Include the requirements as a JSON field alongside the file upload:

import requests
import json

API_KEY = "cpapi_live_sk_xxxxxxxxxxxxxxxx"
API_URL = "https://api.coiparseapi.com/v1/parse"

requirements = {
    "general_liability": {
        "each_occurrence": 1000000,
        "general_aggregate": 2000000
    },
    "auto_liability": {
        "combined_single_limit": 1000000
    },
    "umbrella": {
        "each_occurrence": 5000000
    },
    "additional_insured": True,
    "waiver_of_subrogation": True
}

with open("apex-contractors-coi.pdf", "rb") as f:
    response = requests.post(
        API_URL,
        headers={"Authorization": f"Bearer {API_KEY}"},
        files={"document": ("coi.pdf", f, "application/pdf")},
        data={"requirements": json.dumps(requirements)},
        timeout=30
    )

result = response.json()
print(f"Compliance Score: {result['compliance_score']}/100")
print(f"Flags: {result['flags']}")

The compliance_score is a 0-100 integer. The scoring algorithm weights fields by risk impact: missing additional insured status or expired policies are high-severity deductions. A coverage limit that falls 10% short of the requirement scores differently than one that falls 50% short. You define the thresholds - the API calculates where the certificate lands.

What flags mean: Flags are machine-readable strings that identify specific issues. Common flags include:

gl_each_occurrence_below_requirement - General liability per-occurrence is below your specified minimum
additional_insured_missing - The AI checkbox is not checked and no AI endorsement language is detected in remarks
policy_expired - Any coverage line has an expiration date in the past
umbrella_sublimit_detected - Umbrella has a per-occurrence sublimit that may affect effective coverage
waiver_of_subrogation_missing - Waiver checkbox not found on any applicable line
coverage_gap_detected - Effective date is after your project start date or expiration is before end date

A score of 100 means the certificate satisfies every requirement you specified. Scores below 70 typically indicate at least one hard requirement is unmet and the certificate should be rejected pending correction.

Handling the Response

In production, you need to handle both HTTP-level errors and application-level compliance decisions. Here is a complete response handler in Python that covers both:

from datetime import date
import requests

def process_coi_upload(pdf_path: str, vendor_id: str, requirements: dict) -> dict:
    """
    Parse a COI PDF, check compliance, and return a structured result
    suitable for storing in your database.
    """
    try:
        with open(pdf_path, "rb") as f:
            response = requests.post(
                "https://api.coiparseapi.com/v1/parse",
                headers={"Authorization": f"Bearer {API_KEY}"},
                files={"document": ("coi.pdf", f, "application/pdf")},
                data={"requirements": json.dumps(requirements)},
                timeout=30
            )
        response.raise_for_status()
    except requests.exceptions.Timeout:
        return {"status": "error", "error": "parse_timeout"}
    except requests.exceptions.HTTPError as e:
        return {"status": "error", "error": f"http_{e.response.status_code}"}

    data = response.json()
    gl = data["coverages"].get("general_liability", {})

    # Determine pass/fail
    passed = (
        data["compliance_score"] >= 70
        and not any("expired" in flag for flag in data.get("flags", []))
        and data.get("additional_insured") is True
    )

    return {
        "vendor_id": vendor_id,
        "status": "approved" if passed else "rejected",
        "compliance_score": data["compliance_score"],
        "flags": data["flags"],
        "policyholder_name": data["policyholder"]["name"],
        "insurer_name": data["insurer"]["name"],
        "insurer_naic": data["insurer"]["naic_code"],
        "gl_per_occurrence": gl.get("each_occurrence"),
        "gl_aggregate": gl.get("general_aggregate"),
        "gl_expiration": gl.get("expiration_date"),
        "additional_insured": data.get("additional_insured"),
        "waiver_of_subrogation": data.get("waiver_of_subrogation"),
        "form_type": data["form_type"],
        "parse_confidence": data["parse_confidence"],
        "parsed_at": data["parsed_at"]
    }

For your database schema, store every field the API returns - don't summarize. You'll want to run historical compliance queries (e.g., "show me all vendors whose GL expires in the next 30 days") and those require the raw field values. Store flags as a JSON array column rather than individual boolean columns - new flags may be added over time and array columns handle that gracefully.

Building a Webhook-Ready Integration

For large batch processing - onboarding 200 subcontractors at once, or processing a backlog of scanned certificates - synchronous requests will hit rate limits and create long queues. The async pattern handles this cleanly.

Submit each PDF to the parse endpoint with an async=true parameter. The API immediately returns a job_id and processes the document in the background. When parsing completes, the API POSTs to your webhook URL with the full result payload.

# Submit async parse job
response = requests.post(
    "https://api.coiparseapi.com/v1/parse",
    headers={"Authorization": f"Bearer {API_KEY}"},
    files={"document": ("coi.pdf", f, "application/pdf")},
    data={
        "async": "true",
        "webhook_url": "https://yourapp.com/webhooks/coi-parsed",
        "metadata": json.dumps({"vendor_id": vendor_id, "project_id": project_id})
    }
)

job = response.json()
print(f"Job submitted: {job['job_id']}")
# Store job_id in your DB to correlate with webhook callback

Your webhook endpoint receives a POST with Content-Type: application/json. The payload wraps the standard parse result with job metadata:

{
  "job_id": "job_abc123",
  "status": "completed",
  "metadata": {"vendor_id": "v_456", "project_id": "proj_789"},
  "result": { ... full parse result ... },
  "processed_at": "2026-03-24T14:22:10Z"
}

Implement webhook signature verification using the X-COI-Signature header - it's an HMAC-SHA256 of the raw request body signed with your webhook secret. Always verify before processing. For retry logic on your webhook endpoint, return HTTP 200 on success and any non-2xx code to trigger a retry. The API retries up to 5 times with exponential backoff (1s, 2s, 4s, 8s, 16s).

Handling Edge Cases

Real-world COI intake involves documents that are far from ideal. Plan for these before you go to production.

Scanned PDFs with low confidence

When parse_confidence falls below 0.75, treat the result as requiring human review rather than automated approval. Surface the low-confidence flag in your admin UI and route the certificate to a queue for manual verification. Do not auto-approve based on a parse that scored below your confidence threshold - OCR errors on coverage limits are common in low-resolution scans.

Password-protected PDFs

The API returns HTTP 422 with "error": "pdf_password_protected" for encrypted documents. Your integration should catch this error code and prompt the submitter to re-upload an unprotected version. Some brokers routinely export password-protected PDFs - it's worth documenting this in your vendor submission instructions to prevent it upstream.

ACORD 28 vs ACORD 25 detection

The form_type field tells you which form was submitted. If you're expecting an ACORD 25 (liability) and receive an ACORD 28 (property), your compliance requirements don't apply and you should request the correct form. See the ACORD 25 vs ACORD 28 guide for a full breakdown of what each form covers and when you need both.

Null vs. zero vs. missing

These three values mean very different things in the response schema. null means the field exists on the form but was blank or unreadable. 0 means the field was explicitly filled with zero (rare but valid for certain endorsements). A field being entirely absent from the coverages object means that coverage type is not present on the certificate at all. Your compliance logic needs to distinguish between these - a null workers comp limit may mean it was cut off by bad scanning, while an absent workers comp block means the sub doesn't have that coverage.

Error Handling Reference

Every error response follows this structure:

{
  "error": "error_code_string",
  "message": "Human-readable explanation",
  "request_id": "req_abc123"
}

Log the request_id for every failed request - it's what support will ask for when diagnosing issues.

HTTP Status	Error Code	Meaning and Action
400	`not_a_pdf`	Uploaded file is not a PDF. Validate MIME type client-side before submitting.
400	`not_a_coi`	The PDF is a valid PDF but does not appear to be a Certificate of Insurance. Return the file to the submitter.
401	`invalid_api_key`	API key is missing, malformed, or revoked. Check your key from the dashboard.
402	`insufficient_credits`	Your plan has no remaining parse credits for this billing period. Upgrade or wait for reset.
413	`file_too_large`	PDF exceeds the size limit for your plan tier.
422	`pdf_password_protected`	PDF is encrypted. Request an unprotected version from the submitter.
422	`pdf_corrupted`	File is structurally damaged and cannot be read. Ask for a fresh export from the broker.
429	`rate_limited`	Too many requests. Check the `Retry-After` header and implement backoff.
500	`parse_failed`	Unexpected server error. Retry once - if it fails again, contact support with the `request_id`.

Pro tip: For batch processing jobs, implement a dead-letter queue for documents that hit not_a_coi or pdf_corrupted. These need human intervention - autoretrying them just wastes credits. Route them to a separate review queue immediately.

What to Build Next

With the core integration in place, the natural next steps are expiration monitoring (a scheduled job that queries your COI table for certificates expiring in the next 60 days and triggers renewal requests) and multi-project compliance tracking (mapping which vendors are approved for which specific projects, since a vendor who meets your requirements for a light commercial project may not meet the higher limits required on a large multi-family build).

For the operational workflows that sit on top of this integration, see our guide on automating COI verification and the deeper dive on building a COI compliance workflow for teams that need approval routing, escalation handling, and audit trails.

If you're building this into a property management or construction platform, the COI ParseAPI is currently in early access. The integration you built following this guide is production-ready from day one - there's no additional configuration required to go live.

How to Parse COI PDFs with a REST API: Developer Integration Guide

What the API Returns

Authentication and Rate Limits

Uploading a PDF - The Basic Request

Python (requests library)

Node.js (axios + form-data)

cURL

Adding a Compliance Requirements Template

Handling the Response

Building a Webhook-Ready Integration

Handling Edge Cases

Scanned PDFs with low confidence

Password-protected PDFs

ACORD 28 vs ACORD 25 detection

Null vs. zero vs. missing

Error Handling Reference

What to Build Next

Ready to stop parsing COIs by hand?