--- library_name: transformers pipeline_tag: image-segmentation license: apache-2.0 tags: - vision - image-segmentation - dit datasets: - ds4sd/DocLayNet-v1.1 widget: - src: >- https://upload.wikimedia.org/wikipedia/commons/c/c3/LibreOffice_Writer_6.3.png example_title: Wiki --- Trained for 4 epochs. ``` model = BeitForSemanticSegmentation.from_pretrained("microsoft/dit-base", num_labels=11) ds = load_dataset("ds4sd/DocLayNet-v1.1") mask = np.zeros([11, 1025, 1025]) for b, c in zip(d["bboxes"], d["category_id"]): b = [np.clip(int(bb), 0, 1025) for bb in b] mask[c - 1][b[1]:b[1]+b[3], b[0]:b[0]+b[2]] = 1 mask = [cv2.resize(a, dsize=(56, 56), interpolation=cv2.INTER_AREA) for a in mask] d["label"] = np.stack(mask) ```