Skip to main content

Detection models

Detection models localize text regions in the input image. Specify one via the detection_model query parameter.
ModelArchitectureNotes
db_resnet50DBNet + ResNet-50Default. Good balance of speed and accuracy
db_resnet34DBNet + ResNet-34Lighter than ResNet-50, slightly faster
db_mobilenet_v3_largeDBNet + MobileNetV3-LargeMobile-optimized backbone, fastest DBNet variant
linknet_resnet18LinkNet + ResNet-18Lightweight encoder-decoder
linknet_resnet34LinkNet + ResNet-34Mid-range LinkNet
linknet_resnet50LinkNet + ResNet-50Heaviest LinkNet variant
fast_tinyFAST-TinyFastest overall, lower accuracy
fast_smallFAST-SmallGood speed/accuracy trade-off
fast_baseFAST-BaseBest FAST accuracy

Choosing a detection model

  • For highest accuracy: db_resnet50 or fast_base
  • For fastest inference: fast_tiny or db_mobilenet_v3_large
  • For balanced performance: fast_small or db_resnet34

Recognition models

Recognition models read text from cropped image regions. Specify one via the recognition_model query parameter.
ModelArchitectureNotes
crnn_vgg16_bnCRNN + VGG-16-BNDefault. Proven CTC-based architecture
crnn_mobilenet_v3_smallCRNN + MobileNetV3-SmallFastest CRNN variant
crnn_mobilenet_v3_largeCRNN + MobileNetV3-LargeMobile-optimized, better than small
sar_resnet31SAR + ResNet-31Attention-based, handles curved text
masterMASTERMulti-aspect transformer for scene text
vitstr_smallViTSTR-SmallVision transformer, small variant
vitstr_baseViTSTR-BaseVision transformer, base variant
parseqPARSeqState-of-the-art permutation-based
viptr_tinyViPTR-TinyVision-Perceiver transformer, compact

Choosing a recognition model

  • For highest accuracy: parseq or master
  • For fastest inference: crnn_mobilenet_v3_small
  • For balanced performance: crnn_vgg16_bn (the default)
  • For curved or rotated text: sar_resnet31

Example: custom model combination

curl -X POST "https://ocr-api.trace.so/ocr/?detection_model=fast_base&recognition_model=parseq" \
  -F "[email protected]"