Document Layout Analysis "egret-xlarge"

🚀 egret-xlarge is a Document Layout Analysis Model used in the Docling project.

📄 For an in-depth description of the model architecture, training datasets, and evaluation methodology, please refer to our technical report: "Advanced Layout Analysis Models for Docling", Nikolaos Livathinos et al., 🔗 https://arxiv.org/abs/2509.11720

Inference code example

Prerequisites:

pip install transformers Pillow torch requests

Prediction:

import requests
from transformers import (
    DFineForObjectDetection,
    RTDetrImageProcessor,
)
import torch
from PIL import Image


classes_map = {
    0: "Caption",
    1: "Footnote",
    2: "Formula",
    3: "List-item",
    4: "Page-footer",
    5: "Page-header",
    6: "Picture",
    7: "Section-header",
    8: "Table",
    9: "Text",
    10: "Title",
    11: "Document Index",
    12: "Code",
    13: "Checkbox-Selected",
    14: "Checkbox-Unselected",
    15: "Form",
    16: "Key-Value Region",
}
image_url = "https://huggingface.co/spaces/ds4sd/SmolDocling-256M-Demo/resolve/main/example_images/annual_rep_14.png"
model_name = "ds4sd/docling-layout-egret-xlarge"
threshold = 0.6

# Download the image
image = Image.open(requests.get(image_url, stream=True).raw)
image = image.convert("RGB")


# Initialize the model
image_processor = RTDetrImageProcessor.from_pretrained(model_name)
model = DFineForObjectDetection.from_pretrained(model_name)

# Run the prediction pipeline
inputs = image_processor(images=[image], return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
results = image_processor.post_process_object_detection(
    outputs,
    target_sizes=torch.tensor([image.size[::-1]]),
    threshold=threshold,
)

# Get the results
for result in results:
    for score, label_id, box in zip(
        result["scores"], result["labels"], result["boxes"]
    ):
        score = round(score.item(), 2)
        label = classes_map[label_id.item()]
        box = [round(i, 2) for i in box.tolist()]
        print(f"{label}:{score} {box}")

References

@misc{livathinos2025advancedlayoutanalysismodels,
      title={advanced layout analysis models for docling},
      author={nikolaos livathinos and christoph auer and ahmed nassar and rafael teixeira de lima and maksym lysak and brown ebouky and cesar berrospi and michele dolfi and panagiotis vagenas and matteo omenetti and kasper dinkla and yusik kim and valery weber and lucas morin and ingmar meijer and viktor kuropiatnyk and tim strohmeyer and a. said gurbuz and peter w. j. staar},
      year={2025},
      eprint={2509.11720},
      archiveprefix={arxiv},
      primaryclass={cs.cv},
      url={https://arxiv.org/abs/2509.11720},
}

@techreport{Docling,
  author = {Deep Search Team},
  month = {8},
  title = {Docling Technical Report},
  url = {https://arxiv.org/abs/2408.09869v4},
  eprint = {2408.09869},
  doi = {10.48550/arXiv.2408.09869},
  version = {1.0.0},
  year = {2024}
}

Downloads last month: 515

Safetensors

Model size

62.7M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

docling-project
/

docling-layout-egret-xlarge

Document Layout Analysis "egret-xlarge"

Inference code example

References

Space using docling-project/docling-layout-egret-xlarge 1