Skip to main content

What is text detection?

Text detection is the first stage of the OCR pipeline — it finds where text appears in an image without reading it. The /detection endpoint returns bounding box coordinates for every text region it finds. Use detection on its own when you only need to know where text is, not what it says:
  • Document layout analysis — identify text blocks, headers, and table cells by their positions
  • Text region highlighting — draw attention to text areas in a UI
  • Pre-processing for custom pipelines — feed detected regions into your own recognition model or downstream logic
  • Counting text blocks — quickly assess how much text is on a page

Example: detecting text in an invoice

Here’s an invoice PDF we’ll use to demonstrate detection:
Sample invoice with header, addresses, line items, and totals
Send it to the detection endpoint:
curl -X POST https://ocr-api.trace.so/detection/ \
  -F "[email protected]"
The model scans the full page and draws bounding boxes around every text region:
Invoice with blue bounding boxes overlaid on every text region
The API returns a flat list of bounding boxes for each uploaded file:
[
  {
    "name": "invoice.pdf",
    "bounding_boxes": [
      [0.0733, 0.0283, 0.3288, 0.0596],
      [0.8065, 0.0645, 0.9294, 0.0859],
      [0.5842, 0.1396, 0.6339, 0.1514],
      [0.6353, 0.1396, 0.6906, 0.1504],
      [0.7582, 0.1396, 0.8231, 0.1504],
      [0.0706, 0.1465, 0.1148, 0.1572],
      [0.0706, 0.1611, 0.1175, 0.1729],
      [0.1272, 0.1611, 0.1714, 0.1729],
      [0.1728, 0.1611, 0.2294, 0.1719],
      [0.5828, 0.1582, 0.6257, 0.1699]
    ]
  }
]
The response above is abbreviated — this invoice produces 111 bounding boxes in total. Each box corresponds to a single word-level text region.
Each bounding box is an array of four normalized coordinates: [x_min, y_min, x_max, y_max], where values range from 0 to 1 relative to the page dimensions. To convert to pixel coordinates, multiply by the page width and height. See Geometry format for details.

Parameters

All parameters are optional query parameters passed in the URL.
ParameterDefaultDescription
detection_modeldb_resnet50The detection architecture to use. See available models for options.
assume_straight_pagestrueReturn axis-aligned boxes. Set to false for rotated documents to get 4-point polygons instead.
preserve_aspect_ratiotruePad the image to preserve its aspect ratio before feeding it to the model.
symmetric_paddingtruePad symmetrically (centered) rather than bottom-right only.
detection_batch_size2Number of pages processed in parallel. Increase for multi-page PDFs if you have enough memory.
binary_threshold0.1Pixel-level threshold for the segmentation heatmap.
box_threshold0.1Minimum confidence to keep a detected box.

Tuning thresholds

The two threshold parameters control the sensitivity/precision trade-off:
  • Lower thresholds — detect more text regions, including faint or low-contrast text, but may introduce false positives
  • Higher thresholds — detect fewer, higher-confidence regions, reducing noise but potentially missing subtle text
# More aggressive detection (catch faint text)
curl -X POST "https://ocr-api.trace.so/detection/?binary_threshold=0.05&box_threshold=0.05" \
  -F "[email protected]"

# More conservative detection (reduce noise)
curl -X POST "https://ocr-api.trace.so/detection/?binary_threshold=0.3&box_threshold=0.3" \
  -F "[email protected]"

Detection vs full OCR

The /detection endpoint returns only bounding boxes — it tells you where text is but not what it says. The /ocr endpoint runs both detection and recognition, returning a full nested hierarchy of pages, blocks, lines, and words with recognized text and confidence scores.
/detection/ocr
OutputFlat list of bounding boxesNested hierarchy with recognized text
SpeedFaster (one model)Slower (two models)
Response sizeSmallerLarger
Use whenYou only need text locationsYou need to read the text
For a full walkthrough of the OCR pipeline, see OCR pipeline.

Next steps