How to Validate NACHA Batch Headers Programmatically

An ACH file clears the transfer boundary, the File Header (Type 1) parses, and the ingestion worker starts slicing entries — but every Entry Detail record in the second batch posts against the wrong Standard Entry Class because the Batch Header (Type 5) that opened it carried a Service Class Code of 225 while the batch actually held credits. Nothing raised an exception; the batch simply reconciled against the wrong ruleset until an operations analyst traced a day's worth of misrouted returns back to a single unvalidated header. The Batch Header is the operational envelope for every entry beneath it, so validating it before any entry-level processing is the cheapest fault you will ever prevent. This page sits directly under the byte-level NACHA record layouts guide and, within the broader Core Architecture & Payment File Standards framework, drills into the one record type that governs how a whole batch is interpreted.

The header arrives already framed by the upstream fixed-width file decoding stage; this validator's job is to enforce its meaning — that the Service Class Code, SEC code, ODFI routing prefix, and effective entry date are individually well-formed and mutually consistent — so a malformed envelope is quarantined at ingestion instead of corrupting the transaction matching and reconciliation layer downstream. It is the Type 5 sibling of validating NACHA addenda records with Pydantic, and it shares the same discipline: strict enums, immutable state, and structured errors that preserve the raw bytes.

Concept Spec: The Type 5 Field Map

A NACHA Batch Header is exactly 94 ASCII characters. The format is positional, not delimited — unused alphanumeric fields are space-padded, numeric fields are zero-padded, and both are significant. Because each field is read exactly once by a fixed slice, validating a header is $O(1)$ in the record length with $O(1)$ resident memory; validating an entire file of $n$ headers is $O(n)$ with no per-record allocation beyond the frozen result object. The offsets below are given both 1-indexed (as the NACHA spec prints them) and as 0-indexed Python slices, which is the representation that eliminates the off-by-one errors that dominate positional-parser bugs.

Field	Positions (1-indexed)	Slice (0-indexed)	Rule enforced
Record Type Code	1	`[0:1]`	Must equal `5`
Service Class Code	2–4	`[1:4]`	`200`, `220`, `225`, `280`
Company Name	5–20	`[4:20]`	Left-justified, space-padded
Company Discretionary Data	21–40	`[20:40]`	Optional, free text
Company Identification	41–50	`[40:50]`	IRS EIN or NACHA-assigned
Standard Entry Class Code	51–53	`[50:53]`	PPD, CCD, WEB, TEL, CTX, …
Company Entry Description	54–63	`[53:63]`	Free text (e.g. `PAYROLL`)
Company Descriptive Date	64–69	`[63:69]`	Optional, `YYMMDD`
Effective Entry Date	70–75	`[69:75]`	`YYMMDD`, valid calendar date
Settlement Date	76–78	`[75:78]`	Julian, filled by ACH Operator
Originator Status Code	79	`[78:79]`	`1` (ODFI) or `2` (Fed)
ODFI Routing Number	80–87	`[79:87]`	First 8 digits of ABA RTN
Batch Number	88–94	`[87:94]`	7-digit, sequential in file

The ODFI Routing Number field carries only the first 8 digits of the 9-digit ABA routing number; the check digit lives in the Entry Detail record, not the header, so a mod-10 verification here would always fail. That single fact is the most common cause of false rejects in first-generation batch validators.

Full Annotated Implementation

The validator below uses a frozen dataclass for immutable post-validation state, static methods for zero-overhead field checks, and a structured exception that carries the failing field and the raw bytes for audit routing. Input is accepted as bytes so decoding happens exactly once per record, and the byte-length gate runs first — a wrong length means the slices are meaningless, so there is no point checking anything else.

python

from __future__ import annotations

from dataclasses import dataclass
from datetime import datetime
from typing import Optional

# NACHA Operating Rules, Batch Header: valid Service Class Codes.
# 200 mixed debits+credits, 220 credits only, 225 debits only,
# 280 automated accounting advices (ADV).
VALID_SERVICE_CLASSES: frozenset[str] = frozenset({"200", "220", "225", "280"})

# Standard Entry Class codes accepted by this pipeline. Restrict this set to
# the SEC codes your origination profile actually supports (see Calibration).
VALID_SEC_CODES: frozenset[str] = frozenset(
    {"PPD", "CCD", "WEB", "TEL", "CTX", "ARC", "BOC", "POP", "RCK", "SHR"}
)

# "1" = ODFI originating directly, "2" = Federal Reserve originating as agent.
VALID_ORIGINATOR_STATUS: frozenset[str] = frozenset({"1", "2"})


@dataclass(frozen=True)
class NACHABatchHeader:
    """Immutable, hashable Batch Header the matching engine can dedupe safely."""

    service_class_code: str
    company_name: str
    company_id: str
    sec_code: str
    company_entry_desc: str
    descriptive_date: Optional[datetime]
    effective_entry_date: datetime
    settlement_date: Optional[str]
    originator_status_code: str
    odfi_rtn: str          # 8-digit prefix of the 9-digit ODFI ABA routing number
    batch_number: int


class BatchHeaderValidationError(Exception):
    """Structured failure carrying field-level context for exception routing."""

    def __init__(self, message: str, field: str, raw_value: str) -> None:
        super().__init__(message)
        self.field = field
        self.raw_value = raw_value


def _parse_yymmdd(value: str) -> Optional[datetime]:
    """Parse a NACHA YYMMDD date; blank (all-space) fields are a valid None."""
    if not value.strip():
        return None
    return datetime.strptime(value, "%y%m%d")  # raises ValueError on Feb 30 etc.


def parse_and_validate(raw_line: bytes) -> NACHABatchHeader:
    """Frame, validate, and return a NACHA Batch Header (Type 5) record.

    Aggregates every field-level failure into one raised
    ``BatchHeaderValidationError`` so operations sees the complete diagnosis,
    not just the first broken field.
    """
    if len(raw_line) != 94:
        raise BatchHeaderValidationError(
            f"Line length mismatch: expected 94, got {len(raw_line)}",
            field="RECORD_LENGTH",
            raw_value=raw_line.decode("ascii", errors="replace"),
        )

    # Strip only the terminator; interior padding is structural and significant.
    line = raw_line.decode("ascii").rstrip("\r\n")

    record_type            = line[0:1]
    service_class_code     = line[1:4]
    company_name           = line[4:20]
    company_id             = line[40:50]
    sec_code               = line[50:53]
    company_entry_desc     = line[53:63]
    descriptive_date_str   = line[63:69]
    effective_date_str     = line[69:75]
    settlement_date_str    = line[75:78]
    originator_status_code = line[78:79]
    odfi_rtn               = line[79:87]   # positions 80–87, 8 digits
    batch_number_str       = line[87:94]

    errors: list[str] = []

    if record_type != "5":
        errors.append(f"Record type must be '5', got {record_type!r}")
    if service_class_code not in VALID_SERVICE_CLASSES:
        errors.append(f"Invalid Service Class Code: {service_class_code!r}")
    if sec_code not in VALID_SEC_CODES:
        errors.append(f"Invalid SEC Code: {sec_code!r}")
    if originator_status_code not in VALID_ORIGINATOR_STATUS:
        errors.append(f"Invalid Originator Status Code: {originator_status_code!r}")
    # 8-digit field: the 9th ABA check digit is not present in the header.
    if not (odfi_rtn.isdigit() and len(odfi_rtn) == 8):
        errors.append(f"Invalid ODFI RTN format: {odfi_rtn!r}")

    effective_entry_date: Optional[datetime] = None
    try:
        effective_entry_date = datetime.strptime(effective_date_str, "%y%m%d")
    except ValueError:
        errors.append(f"Malformed Effective Entry Date: {effective_date_str!r}")

    descriptive_date: Optional[datetime] = None
    try:
        descriptive_date = _parse_yymmdd(descriptive_date_str)
    except ValueError:
        errors.append(f"Malformed Descriptive Date: {descriptive_date_str!r}")

    batch_number = 0
    try:
        batch_number = int(batch_number_str)
    except ValueError:
        errors.append(f"Malformed Batch Number: {batch_number_str!r}")

    if errors or effective_entry_date is None:
        raise BatchHeaderValidationError(
            f"Batch header validation failed: {'; '.join(errors)}",
            field="MULTIPLE",
            raw_value=line,
        )

    settlement_date = settlement_date_str.strip() or None
    return NACHABatchHeader(
        service_class_code=service_class_code,
        company_name=company_name.strip(),
        company_id=company_id.strip(),
        sec_code=sec_code,
        company_entry_desc=company_entry_desc.strip(),
        descriptive_date=descriptive_date,
        effective_entry_date=effective_entry_date,
        settlement_date=settlement_date,
        originator_status_code=originator_status_code,
        odfi_rtn=odfi_rtn,
        batch_number=batch_number,
    )

The streaming driver keeps the file on disk and yields one validated header at a time, so a multi-gigabyte file never materializes in memory. It never lets a single bad header abort the run — malformed records are surfaced as error payloads destined for a dead-letter queue, the same generator discipline the sibling high-volume pandas.read_fwf guide applies to columnar loads.

python

from collections.abc import Iterator


def stream_batch_headers(path: str) -> Iterator[dict]:
    """Yield each Type 5 header as a validated record or an error payload.

    Non-header record types are skipped silently; the pipeline never aborts on
    a bad line, and every failure keeps its original bytes for the audit trail.
    """
    with open(path, "rb") as fh:
        for raw in fh:
            line = raw.rstrip(b"\r\n")
            if not line[0:1] == b"5":
                continue  # not a Batch Header — skip without error
            try:
                header = parse_and_validate(line)
                yield {"ok": True, "header": header}
            except BatchHeaderValidationError as exc:
                yield {
                    "ok": False,
                    "field": exc.field,
                    "error": str(exc),
                    "raw_record": exc.raw_value,  # exact bytes for DLQ + audit
                }

Calibration & Configuration

The three allowlists are policy surfaces, not constants — tune them to the rail and the origination profile rather than shipping the permissive defaults. VALID_SERVICE_CLASSES should be narrowed when you know your inbound stream: a receive-only reconciliation feed for a corporate cash-management product will only ever see 220 and 225, so accepting 280 (ADV) there means a mislabeled accounting-advice batch slips through as if it were live money. Load the set from the same configuration source your ACH origination agreement uses so the two never drift.

VALID_SEC_CODES is the field most worth constraining. A consumer-only pipeline that suddenly receives CCD (a corporate-account code) is more likely looking at a misrouted file than a legitimate entry, and WEB/TEL carry authorization and dispute-window obligations a corporate-only pipeline is not provisioned to honor. Keep this allowlist per-product, not global. For Same Day ACH, add an effective-entry-date sanity check on top of the parse: reject a header whose effective_entry_date is more than a few banking days in the future or already in the past, since a stale date silently pushes settlement into the wrong window. That threshold is a configuration value, not a hardcoded constant — express it as a business-day delta so weekend and Federal Reserve holiday rollovers do not trip false rejects.

Validation Example: Before and After

Consider a real-looking Batch Header for a credits-only payroll batch (220, PPD) originated by ODFI 09100001, effective July 3 2025:

text

5220ACME PAYROLL LLC                    1234567890PPDPAYROLL         250703   1091000010000001

Passing that 94-byte string to parse_and_validate(raw.encode("ascii")) frames each slice, confirms 220 is an allowed Service Class Code and PPD an allowed SEC code, verifies the 8-digit ODFI prefix 09100001, parses 250703 into a datetime(2025, 7, 3), and returns a frozen NACHABatchHeader. Now corrupt three bytes — replace the Service Class Code with the invalid 999:

text

5999ACME PAYROLL LLC                    1234567890PPDPAYROLL         250703   1091000010000001

The length gate still passes (it is 94 bytes), but the Service Class Code check appends Invalid Service Class Code: '999' to the error list. Because every other field is well-formed, the raised BatchHeaderValidationError carries field="MULTIPLE" and a message naming exactly the one broken field, with raw_value holding the full corrupted line. The streaming driver catches it, emits {"ok": False, "field": "MULTIPLE", "raw_record": "5999ACME…"}, and moves to the next line — the ledger never sees the batch, and the audit trail retains the exact bytes for manual review.

Failure Modes & Guardrails

Three edge cases cause silent corruption in production header pipelines, and each has a specific guard.

Hidden byte-alignment shifts pass the length gate. A UTF-8 BOM (\xEF\xBB\xBF) prepended by a Windows editor, or a lone \r mid-line, shifts every field one to three positions right — and the record can still be 94 bytes, so the length check does not catch it. The guard is to open in binary (rb) and decode with encoding="ascii": a BOM's high bytes raise UnicodeDecodeError at decode time rather than producing a plausible-but-wrong Service Class Code. Inspect suspect files with hexdump -C or repr() before trusting any positional slice.
Two-digit-year rollover parses an impossible date as valid. The effective entry date is YYMMDD, so 250703 is unambiguous — but strptime will happily reject 250230 (Feb 30) with ValueError, which is caught, and will silently accept a syntactically valid date that is operationally wrong, such as a header whose year has rolled to 26 while the file was queued across a New Year boundary. The parse alone is not enough; layer the business-day window check from the Calibration section so a valid-but-stale date is quarantined instead of settling late.
An empty or space-padded ODFI field is not a routing number. A truncated transmission can leave odfi_rtn as eight spaces, which str.isdigit() correctly rejects — but a naïve int(odfi_rtn) (a tempting "simplification") would raise on spaces yet silently strip a leading zero from a value like 09100001, turning it into 9100001 and misrouting the batch. Never coerce the routing prefix to an integer; validate it as an 8-character digit string and store it as a string. Failing this field upstream is what prevents R01 (Insufficient Funds) and R03 (No Account) returns that would otherwise blow through your settlement SLA.

Headers that clear all three guards are canonical and flow forward to entry-level parsing and the transaction matching and reconciliation algorithms; everything else is quarantined with its original bytes intact. For the authoritative rulebook on Service Class Codes, SEC codes, and return semantics, consult the Federal Reserve FedACH guidelines.

Up: NACHA Record Layouts Explained — the byte-level record grammar this header validator lives inside.
Validating NACHA Addenda Records with Pydantic — the Type 7 sibling, framed with declarative Pydantic models.
Handling Encoding Drift in Legacy Bank Exports — why encoding="ascii" upstream stops the BOM/alignment failure above.
Optimizing pandas.read_fwf for 1GB NACHA Files — memory-bounded columnar decoding of the same files.

How to Validate NACHA Batch Headers Programmatically #

Concept Spec: The Type 5 Field Map #

Full Annotated Implementation #

Calibration & Configuration #

Validation Example: Before and After #

Failure Modes & Guardrails #

Related #