Reading supplier stock CSVs: what goes wrong and how to catch it
CSV is supposed to be the simple format. Seven commas, a newline, twenty years of tooling, and yet every new supplier we onboard finds a fresh way to break our parser. What follows is a catalogue, loosely sorted by how often we see it, of the ways a stock export sabotages an ingest.
Header drift
"SKU" becomes "Sku" becomes "Product Code" becomes "Item No.". The column you mapped last week isn't there this week. The fix is to pick column names per supplier once, make the mapping explicit, and alert when the expected column is missing — don't silently fall back to a best-match heuristic because that's how you end up writing a price value into an on-hand field.
Row 1 isn't the header
Some suppliers ship a three-line preamble: company name, generated timestamp, blank row, then the headers. Excel does this by default if the export was designed for human eyes. Make `headerRow` configurable per supplier. Default to 1 and let merchants override it.
Encoding surprises
The big ones we've seen:
- UTF-16 LE with BOM. Someone opened the file in Excel on Windows and re-saved it. Your
utf-8parser reads it as a string of null-byte-prefixed gibberish. Detect the BOM; try UTF-16 if UTF-8 parsing fails. - Latin-1 / Windows-1252. Accented characters come through as `£` or `’`. Not fatal for stock quantities but corrupts product titles. Detect and recode; don't try to force everything to UTF-8.
- Mixed line endings. CRLF inside fields that were copy-pasted from Word. Use a parser that handles embedded newlines inside quoted fields (Papaparse and csv-parse both do; naïve split-on-newline doesn't).
Numbers that aren't numbers
Quantity columns arrive as:
"100"— quoted, because their ERP quotes every field." 100 "— quoted and padded.1,000— thousands separator, comma, in a CSV. Ambiguous at best, silently wrong when parsed as a string.100.00— when the stock is integer but the system stores all quantities as floats."1.000,50"— European number format. Period as thousands separator, comma as decimal. Bring a locale.<10or1000+— ranged values used by some B2B suppliers to hide exact levels.
Cast to integer after trimming, removing thousands separators, and flagging ranges. If it still can't be coerced, report the row — don't drop silently.
XLSX is worse, actually
XLSX is CSV's ambitious cousin. Three failure modes we see regularly:
- Multiple sheets. The stock data is on sheet 2, not sheet 1. The first sheet is usually an "Instructions" or "Key" tab. Make sheet name configurable per supplier.
- Formulas, not values. The quantity cell is
=Summary!B17. Your parser reads the formula. Always resolve to cached values — libraries like `xlsx` (SheetJS) do this by default but you have to ask for it. - Dates as serial numbers. You see
45870instead of2025-07-15. Excel epochs. Convert only when the column is a date column — don't blanket-translate numeric cells.
Zero rows is a canary
The most important check in the whole pipeline: if a file came in, parsed without error, and produced zero rows, alert. That almost always means the supplier changed their format and you're now silently missing all their updates. A "succeeded" ingest with zero rows is worse than a failed one, because nobody looks.
Defences that actually work
- Explicit per-supplier column mapping. Never auto-infer past the first preview.
- Fail loudly on missing columns, unexpected formats, zero rows.
- Keep the raw attachment. When a parse fails three weeks later, you need the original bytes to diagnose.
- Surface unmatched SKUs as a first-class audit output, not a log line.
- Sender allowlist per supplier. Spoofed From headers are the classic "stock suddenly zero" attack.
Stockpost does all of this. See how the pipeline fits together, or install on a 14-day trial.