Case Study: Logistics Document Processing Automation for an FMCG Company - File to order
The Challenge
A European FMCG company operating in the Polish market spent a significant amount of employee time manually rewriting logistics documents into the formats required by their 3PL operator — Raben. The process involved two key workflows:
Outbound orders — the company receives purchase orders from its trading partners. Every partner sends orders in a completely different format: some as Excel spreadsheets, others as PDFs, and one as text files. Each order had to be manually retyped into Raben's standardized 48-column template before shipping could be scheduled.
Inbound deliveries — delivery advices from the supplier arrived as bilingual PDF documents with a complex nested structure: product blocks containing multiple batches, each with its own expiration date and number of cartons. This data had to be processed into Raben's 21-column template.
Key issues
- 25+ different input formats — each trading partner uses their own document layout, column naming, product identifiers, and date formats.
- Manual SKU translation — client product codes had to be looked up in an internal matrix to find the correct warehouse SKU codes.
- Multi-warehouse address resolution — over half of the partners deliver to multiple distribution centers; the correct warehouse ID had to be manually identified for each order.
- Quantity conversion — some partners report quantities in cartons, others in units, which required manual multiplication for every single product.
- High error rate — manually retyping hundreds of product lines a day frequently led to mistakes.
- Time pressure — logistics deadlines required same-day order processing.
The Solution
I built a Python desktop application that fully automates both processes. The tool runs exclusively on the client's local computer — no data is sent to external services or the cloud.
How it works
The user drops the source files into a designated folder, clicks a single button, and within seconds receives a ready-to-use XLSX file fully compliant with the Raben template. Processed files are automatically archived by date.
Internal application logic:
- Source identification — analyzing the file structure, content signatures, and format to determine which trading partner (or supplier) the document came from.
- Data extraction — a dedicated parser for each format reads the relevant fields using pattern matching, table extraction, and state machine logic.
- Product code resolution — translating client-specific product identifiers into internal warehouse SKUs based on a master data matrix.
- Delivery address resolution — for multi-warehouse partners, the system determines the correct Raben warehouse ID based on the delivery address using a multi-level fuzzy matching engine (postal codes, city names, warehouse codes, normalized text matching).
- Output file generation — writing a complete, ready-to-upload XLSX file populated with all constant values, sequential order numbers, and converted quantities.
- Archiving — moving processed files into date-stamped folders; files containing errors are sent to a verification queue.
Technical Challenges
Cryptographic PDF decoding — one trading partner's system generates PDFs with custom font encoding (CID), where characters are replaced by numeric codes. Standard PDF libraries returned unreadable gibberish. I developed a technique that uses known embedded text patterns (file paths, order numbers) to automatically reconstruct the character mapping — essentially a known-plaintext attack that decodes the document without any manual intervention.
Bilingual delivery advice parser — bilingual PDF delivery advices have a nested structure where a single product might span multiple batches, each with a different expiration date and quantities expressed in cartons instead of individual units. The parser utilizes a state machine to track context across rows and accurately calculates the unit quantities based on carton counts and packaging ratios.
Intelligent address matching — the address resolution system handles real-world data issues: Polish diacritics lost during PDF extraction (Wyszków → Wyszkow), multiple warehouses in the same city requiring disambiguation by warehouse codes, and cross-client collisions when different companies have warehouses in the identical city. The system employs client-filtered indexes with normalized text matching.
Dual-process GUI — a clean, tabbed interface allows non-technical employees to easily switch between processing outbound orders and inbound deliveries. It includes built-in numbering management, progress tracking, and error reporting.
Results
| Metric | Before | After |
|---|---|---|
| Batch processing time | 45–90 min (manual) | Under 10 seconds |
| Error rate | Frequent (manual rewriting) | Near zero (automated validation) |
| Supported formats | Required domain expertise | Fully automated identification |
| Data security | N/A | 100% local processing, zero cloud exposure |
| Employee dependency | Trained operator required | Any team member can operate |
What changed
- Reclaimed hours weekly — what used to consume a significant portion of the logistics coordinator's day now takes seconds.
- Elimination of human error — automated SKU resolution and quantity conversions removed the most common source of mistakes.
- Removal of specialist dependency — previously, only one trained person could process orders; now anyone on the team can click the button.
- Scalability — adding a new trading partner only requires a new parser module; the rest of the pipeline functions automatically.
- Resilience to changes — when a supplier modified their PDF layout, the modular architecture allowed for a targeted fix without affecting the rest of the system.
Technology
| Component | Details |
|---|---|
| Language | Python 3.11+ |
| GUI | Tkinter (native, no browser) |
| PDF processing | pdfplumber |
| Excel handling | openpyxl, xlrd |
| Architecture | Modular pipeline with interchangeable parsers |
| Deployment | Local application, serverless, cloudless |
| Data security | On-device processing only; data never leaves the computer |
This solution was built as a dedicated business process automation project. The application runs entirely on the client's local hardware, guaranteeing that sensitive commercial data — pricing, order volumes, customer relationships — never leaves their control.