Two-stage architecture
Trace OCR uses a two-stage pipeline:- Detection — a deep learning model scans the full image and outputs bounding boxes around text regions
- Recognition — each detected region is cropped and fed to a second model that reads the characters
Detection stage
The detection model takes a full document image and produces a list of bounding boxes, each containing text. You can tune detection behavior with these parameters:| Parameter | Default | Description |
|---|---|---|
detection_model | db_resnet50 | The detection architecture to use |
assume_straight_pages | true | Skip rotation handling for faster inference on upright documents |
preserve_aspect_ratio | true | Pad the image to preserve aspect ratio before detection |
symmetric_padding | true | Pad symmetrically instead of bottom-right only |
binary_threshold | 0.1 | Pixel-level threshold for the segmentation map |
box_threshold | 0.1 | Minimum confidence to keep a detected box |
detection_batch_size | 2 | Number of pages processed in parallel |
Recognition stage
Each detected text region is cropped and resized, then passed to the recognition model. The model outputs:- value — the recognized text string
- confidence — a score from 0 to 1 indicating how certain the model is
| Parameter | Default | Description |
|---|---|---|
recognition_model | crnn_vgg16_bn | The recognition architecture to use |
recognition_batch_size | 128 | Number of crops processed in parallel |
Response hierarchy
The full OCR endpoint (/ocr) returns a nested structure:
group_lines (default true) and group_blocks (default false) parameters on the /ocr endpoint.
Geometry format
All bounding boxes use normalized coordinates — floats between 0 and 1 relative to the page dimensions. For straight pages (assume_straight_pages=true), geometry is [x_min, y_min, x_max, y_max]:
Additional options
The/ocr and /kie endpoints support extra parameters:
| Parameter | Default | Description |
|---|---|---|
detect_orientation | false | Estimate the page rotation angle |
detect_language | false | Predict the page language |
straighten_pages | false | Auto-rotate pages before detection |
disable_page_orientation | false | Skip page orientation classification |
disable_text_orientation | false | Skip per-crop text orientation classification |
paragraph_break | 0.0035 | Vertical gap threshold for splitting blocks (OCR only) |