Skip to main content

What is key information extraction?

Key information extraction (KIE) runs the same detection + recognition pipeline as /ocr, but structures the output differently. Instead of a nested hierarchy of pages, blocks, lines, and words, the /kie endpoint groups all detected text under labeled field types — returning a flat list of recognized words with their positions and confidence scores. The default predictor uses a single "words" field type that contains every detected word. Custom field types (e.g. "total", "date", "vendor") require fine-tuning on a labeled dataset. Use KIE when you want:
  • Flat structured output — skip the block/line hierarchy and get all words directly for downstream processing
  • Custom classification pipelines — pair the flat word list with your own field-type logic
  • Preparing for fine-tuning — start with the default output, then train on labeled data (CORD, FUNSD, SROIE) to get semantic field types

Example: extracting fields from an invoice

Here’s the same invoice from the text detection guide:
Sample invoice with header, addresses, line items, and totals
Send it to the KIE endpoint:
curl -X POST https://ocr-api.trace.so/kie/ \
  -F "[email protected]"
The API returns extracted fields grouped by field type:
[
  {
    "name": "invoice.pdf",
    "orientation": { "value": 0.0, "confidence": null },
    "language": { "value": null, "confidence": null },
    "dimensions": [595, 842],
    "extracted_fields": [
      {
        "field_type": "words",
        "values": [
          {
            "value": "Invoice",
            "geometry": [0.5842, 0.1396, 0.6339, 0.1514],
            "detection_score": 0.99,
            "confidence": 0.96,
            "text_orientation": { "value": 0, "confidence": null }
          },
          {
            "value": "INV-3337",
            "geometry": [0.7582, 0.1396, 0.8231, 0.1504],
            "detection_score": 0.99,
            "confidence": 0.77,
            "text_orientation": { "value": 0, "confidence": null }
          },
          {
            "value": "January",
            "geometry": [0.7554, 0.1768, 0.8148, 0.1914],
            "detection_score": 0.99,
            "confidence": 0.91,
            "text_orientation": { "value": 0, "confidence": null }
          },
          {
            "value": "$93.50",
            "geometry": [0.7568, 0.2148, 0.8052, 0.2266],
            "detection_score": 0.99,
            "confidence": 0.99,
            "text_orientation": { "value": 0, "confidence": null }
          },
          {
            "value": "Test",
            "geometry": [0.0692, 0.291, 0.1023, 0.3037],
            "detection_score": 0.99,
            "confidence": 1.0,
            "text_orientation": { "value": 0, "confidence": null }
          },
          {
            "value": "Business",
            "geometry": [0.1037, 0.292, 0.1658, 0.3027],
            "detection_score": 0.99,
            "confidence": 1.0,
            "text_orientation": { "value": 0, "confidence": null }
          },
          {
            "value": "Web",
            "geometry": [0.17, 0.417, 0.2059, 0.4287],
            "detection_score": 0.99,
            "confidence": 1.0,
            "text_orientation": { "value": 0, "confidence": null }
          },
          {
            "value": "Design",
            "geometry": [0.2045, 0.416, 0.2556, 0.4316],
            "detection_score": 0.99,
            "confidence": 1.0,
            "text_orientation": { "value": 0, "confidence": null }
          },
          {
            "value": "$85.00",
            "geometry": [0.6077, 0.4229, 0.656, 0.4346],
            "detection_score": 0.99,
            "confidence": 0.79,
            "text_orientation": { "value": 0, "confidence": null }
          },
          {
            "value": "$8.50",
            "geometry": [0.8742, 0.4834, 0.9184, 0.498],
            "detection_score": 0.99,
            "confidence": 0.91,
            "text_orientation": { "value": 0, "confidence": null }
          }
        ]
      }
    ]
  }
]
The response above is abbreviated — this invoice produces 111 words in total. All words appear under the default "words" field type. Custom field types like "total" or "date" require a fine-tuned model.
Each value in the response includes:
FieldDescription
valueThe recognized text string
geometryBounding box as [x_min, y_min, x_max, y_max], normalized 0–1 relative to page dimensions
confidenceRecognition confidence (0–1) — how sure the model is about the text content
detection_scoreDetection confidence (0–1) — how sure the model is that this region contains text
text_orientationDetected rotation of the text crop (0 = horizontal)

Parameters

All parameters are optional query parameters passed in the URL.
ParameterDefaultDescription
detection_modeldb_resnet50The detection architecture to use. See available models.
recognition_modelcrnn_vgg16_bnThe recognition architecture to use. See available models.
assume_straight_pagestrueReturn axis-aligned boxes. Set to false for rotated documents to get 4-point polygons.
preserve_aspect_ratiotruePad the image to preserve its aspect ratio before feeding it to the model.
detect_orientationfalseDetect and report the page orientation (rotation angle).
detect_languagefalseDetect and report the page language.
symmetric_paddingtruePad symmetrically (centered) rather than bottom-right only.
straighten_pagesfalseAutomatically rotate pages to correct detected skew.
detection_batch_size2Number of pages processed in parallel for detection. Increase for multi-page PDFs if you have enough memory.
recognition_batch_size128Number of text crops processed in parallel for recognition. Decrease if you run into memory issues.
disable_page_orientationfalseSkip page orientation classification entirely.
disable_text_orientationfalseSkip text crop orientation classification.
binary_threshold0.1Pixel-level threshold for the segmentation heatmap.
box_threshold0.1Minimum confidence to keep a detected box.
# Use a different model pair with orientation detection
curl -X POST "https://ocr-api.trace.so/kie/?detection_model=db_resnet50&recognition_model=parseq&detect_orientation=true" \
  -F "[email protected]"

KIE vs full OCR

The /kie and /ocr endpoints run the same detection and recognition models, but return results in different structures.
/kie/ocr
Output structureFlat list of words grouped by field typeNested hierarchy: pages > blocks > lines > words
Field groupingWords grouped under labeled field types (default: "words")Words grouped by spatial proximity into lines and blocks
Spatial contextIndividual word positions onlyLine-level and block-level bounding boxes included
Use whenYou need a flat word list for downstream processing or custom field classificationYou need the document’s visual structure (paragraphs, columns, sections)

Understanding field types

The default KIE predictor groups all detected text under a single "words" field type. It does not semantically classify fields — every word on the page appears in the same list regardless of whether it’s an invoice number, date, or line item. With a fine-tuned model, the output would contain multiple field types:
{
  "extracted_fields": [
    {
      "field_type": "vendor_name",
      "values": [
        { "value": "DEMO", "geometry": [0.0706, 0.1611, 0.1175, 0.1729], "confidence": 1.0, "..." : "..." },
        { "value": "Sliced", "geometry": [0.1272, 0.1611, 0.1714, 0.1729], "confidence": 1.0, "..." : "..." },
        { "value": "Invoices", "geometry": [0.1728, 0.1611, 0.2294, 0.1719], "confidence": 0.99, "..." : "..." }
      ]
    },
    {
      "field_type": "total_amount",
      "values": [
        { "value": "$93.50", "geometry": [0.7568, 0.2148, 0.8052, 0.2266], "confidence": 0.99, "..." : "..." }
      ]
    }
  ]
}
Training datasets for document KIE include:
  • CORD — receipt understanding with 30 field types (menu items, prices, totals)
  • FUNSD — form understanding with header, question, answer, and other fields
  • SROIE — receipt extraction for company, date, address, and total
These datasets are commonly used for fine-tuning KIE models.

Next steps