Ege Oezsoy commited on
Commit
1437924
1 Parent(s): 0859bb3

Upload Model Weights and Data

Browse files
README.md CHANGED
@@ -1,3 +1,78 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ <div align="center">
5
+ <h1>
6
+ ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling
7
+ </h1>
8
+ </div>
9
+
10
+ <p align="center">
11
+ <a href="https://github.com/egeozsoy/ORacle" target="_blank">Github</a></a>
12
+ </p>
13
+
14
+ <div align="center">
15
+ </div>
16
+
17
+ Every day, countless surgeries are performed worldwide, each within the distinct settings of operating rooms (ORs) that vary not only in their setups but also in the personnel, tools, and equipment used.
18
+ This inherent diversity poses a substantial challenge for achieving a holistic understanding of the OR, as it requires models to generalize beyond their initial training datasets.
19
+ To reduce this gap, we introduce ORacle, an advanced vision-language model designed for holistic OR domain modeling, which incorporates multi-view and temporal capabilities and can leverage external knowledge during inference, enabling it to adapt to previously unseen surgical scenarios.
20
+ This capability is further enhanced by our novel data augmentation framework, which significantly diversifies the training dataset, ensuring ORacle's proficiency in applying the provided knowledge effectively.
21
+ In rigorous testing, in scene graph generation, and downstream tasks on the 4D-OR dataset, ORacle not only demonstrates state-of-the-art performance but does so requiring less data than existing models.
22
+ Furthermore, its adaptability is displayed through its ability to interpret unseen views, actions, and appearances of tools and equipment.
23
+ This demonstrates ORacle's potential to significantly enhance the scalability and affordability of OR domain modeling and opens a pathway for future advancements in surgical data science.
24
+
25
+ Please check out of github page (https://github.com/egeozsoy/ORacle) for the full code.
26
+
27
+
28
+ **Authors**: [Ege Özsoy][eo]\*, [Chantal Pellegrini][cp]\*, [Matthias Keicher][mk], [Nassir Navab][nassir]
29
+
30
+ [eo]:https://www.cs.cit.tum.de/camp/members/ege-oezsoy/
31
+
32
+ [cp]:https://www.cs.cit.tum.de/camp/members/chantal-pellegrini/
33
+
34
+ [mk]:https://www.cs.cit.tum.de/camp/members/matthias-keicher/
35
+
36
+ [nassir]:https://www.cs.cit.tum.de/camp/members/cv-nassir-navab/nassir-navab/
37
+
38
+ ```
39
+ @article{ozsoy2024oracle,
40
+ title={ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling},
41
+ author={{\"O}zsoy, Ege and Pellegrini, Chantal and Keicher, Matthias and Navab, Nassir},
42
+ journal={arXiv preprint arXiv:2404.07031},
43
+ year={2024}
44
+ }
45
+
46
+ @inproceedings{Özsoy2023_LABRAD_OR,
47
+ title={LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal Reasoning in Dynamic Operating Rooms},
48
+ author={Ege Özsoy, Tobias Czempiel, Felix Holm, Chantal Pellegrini, Nassir Navab},
49
+ booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
50
+ year={2023},
51
+ organization={Springer}
52
+ }
53
+ @Article{Özsoy2023,
54
+ author={{\"O}zsoy, Ege
55
+ and Czempiel, Tobias
56
+ and {\"O}rnek, Evin P{\i}nar
57
+ and Eck, Ulrich
58
+ and Tombari, Federico
59
+ and Navab, Nassir},
60
+ title={Holistic OR domain modeling: a semantic scene graph approach},
61
+ journal={International Journal of Computer Assisted Radiology and Surgery},
62
+ year={2023},
63
+ doi={10.1007/s11548-023-03022-w},
64
+ url={https://doi.org/10.1007/s11548-023-03022-w}
65
+ }
66
+ @inproceedings{Özsoy2022_4D_OR,
67
+ title={4D-OR: Semantic Scene Graphs for OR Domain Modeling},
68
+ author={Ege Özsoy, Evin Pınar Örnek, Ulrich Eck, Tobias Czempiel, Federico Tombari, Nassir Navab},
69
+ booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
70
+ year={2022},
71
+ organization={Springer}
72
+ }
73
+ @inproceedings{Özsoy2021_MSSG,
74
+ title={Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical Procedures},
75
+ author={Ege Özsoy, Evin Pınar Örnek, Ulrich Eck, Federico Tombari, Nassir Navab},
76
+ booktitle={Arxiv},
77
+ year={2021}
78
+ }
adaptability_4dor.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e15ee5a2e8baf42f1de925479a8903b48482ca66efbc73be7c0f56c5658db181
3
+ size 113398915
llava-v1.5-7b-task-lora_4dor_qlora_100perm_4_view_2135_orderaug_image.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a90c133c9338dedef3548b94088b45304e83b16b3bf0230a60165b3ece98c75
3
+ size 822784687
llava-v1.5-7b-task-lora_4dor_qlora_100perm_4_view_2135_orderaug_image_synthetic.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c500f6c69b092784fc9f32f8c41808abb38dd777e5398dfea530ca63c3b4eb85
3
+ size 822682960
llava-v1.5-7b-task-lora_4dor_qlora_100perm_4_view_2135_orderaug_image_synthetic_visual.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:899ada62344d9d98075d0e1d00c218b5752772f8f808febe7b97721fbe15d4fd
3
+ size 822713695
llava-v1.5-7b-task-lora_4dor_qlora_100perm_4_view_2135_orderaug_image_temporal_curriculum.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6954688078bb068d10cf15e6ccf386dd70ce9b5cfe391dae5a0e3418b98b268d
3
+ size 822508186
original_crops.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:922174e26fc2e38cb8f9d0a7cc6962ebbfd35849bba934b52a2df80c9893dd65
3
+ size 529662864