Back to results

AI & Automation, Custom Development

99%+ Accuracy in Automated B2B Inquiry Classification, Zero Fine-Tuning

A large Czech industrial enterprise's domain experts spent up to five hours a week manually classifying unstructured B2B inquiries into more than 30 technical parameters. We built an LLM pipeline that handles it in minutes, with over 99% accuracy, achieved entirely through prompt engineering, not fine-tuning.

Major Czech Industrial Enterprise case study

What We Built

Unstructured Document Processing

An OCR and LLM pipeline that handles PDFs, scanned documents, and handwritten specifications at scale, regardless of format or language variation.

Multi-Parameter Extraction Engine

One dedicated LLM call per parameter, each returning a structured JSON value and a verbatim citation from the source document for expert verification.

Confidence-Based Routing

Semi-deterministic confidence scoring per parameter, routing high-confidence extractions to review-ready status and flagging uncertain values for expert attention.

Self-Improving Feedback Loop

Expert corrections and explanations feed back into the prompt context, improving classification accuracy over time without model retraining.

Kanban Review Board

A web interface with email notifications, per-expert assignment, and Excel export that fits into existing sales workflows.

Challenge

A large Czech industrial enterprise's sales department handles a high volume of B2B inquiries every week. Each arrives in a different format: some as structured PDFs, others as scanned images of handwritten technical specifications, some as plain text with mixed-language annotations.

Every inquiry lands with a team of domain experts who know the field in depth, classifying each submission into more than 30 parameters: material dimensions, chemical composition, mechanical properties, surface treatment, packaging specifications. Before our involvement, experts spent up to five hours a week on this classification.

The bottleneck had a direct commercial cost. In B2B heavy industry, response speed is one of the key competitive factors. Every hour of classification delay was a disadvantage: not because the quotes were worse, but because they arrived later.

Off-the-shelf OCR tools weren't reliable enough across the diverse input formats. Standard document parsing couldn't handle the domain specificity: a mislabeled material parameter meant either rework or an incorrect quote. The company needed a system that was both accurate enough to trust and fast enough to matter.

Solution

We built an LLM workflow that processes each inquiry from raw input to structured output, ready for the sales team in minutes.

Document parsing. The pipeline starts with OCR on the visual input, then splits multi-item inquiries into individual line items. A single inquiry may cover ten different material variants with different specifications. The system handles each item independently.

One call per parameter. Rather than extracting all parameters in a single LLM call, the system uses a dedicated extraction call for each parameter. This prevents cross-parameter contamination: the model's reasoning stays focused on a single attribute, and an error in one parameter cannot cascade into others. Each call returns a structured JSON value plus a verbatim citation from the source document, so experts can verify any extraction without re-reading the original.

No fine-tuning required. The model we selected already had strong knowledge of the domain: material properties, technical notation, industry standards. The challenge lay in how we framed each extraction task, not in what the model knew. The team reached >99% classification accuracy through prompt engineering, structured output schemas, and iterative evaluation against a benchmark of 60 real inquiries with expert-prepared expected outputs.

Confidence scoring. The system computes a confidence score for each parameter semi-deterministically from the model's output. Asking the model to rate its own confidence tends to be poorly calibrated on specialized domain tasks, so we avoided that approach. High-confidence results proceed directly to review-ready status. The system flags low-confidence results for expert attention.

Feedback loop. When an expert corrects a misclassification, they record the corrected value and a short explanation. The pipeline incorporates these corrections into its prompt context for subsequent extractions. Accuracy improves incrementally as experts use the system, without any retraining.

The interface provides a Kanban board: Processing → Waiting for Review → In Review → Done. Each expert receives an email notification with a direct link to their review items. Finalized outputs export to an Excel spreadsheet for the sales team to use in quoting.

Results

Classification work that previously consumed up to five hours a week now runs in minutes. Sales teams can respond to inquiries within hours instead of days.

We validated the >99% accuracy on a benchmark of 60 real inquiries with expected outputs prepared by experts. The team responded positively; several described the result as extremely positive.

You can directly reuse the core pattern of parallel parameter extraction, confidence-based routing, and expert feedback loops across other industrial classification problems. Any domain where humans classify unstructured inputs against a defined schema is a candidate.

Facing a similar challenge? Book a free consultation.

In a free 30-minute strategy call, you'll get: An assessment of the biggest AI potential in your company. | 2–3 concrete next steps. | A clear estimate of your return on investment.

Explore Further Case Studies

Automation saved 25% of developers' time

Revolgy

AI & Automation, Custom Development

Automation saved 25% of developers' time

25% time saved

CI/CD process optimization

0 human errors

Detail

90% AI Adoption Across 13 R&D Teams

Heureka Group

AI & Automation, Strategy & Training

90%+ of R&D using AI tools

50% time savings on key tasks

13 teams onboarded in 3 months