Upload folder using huggingface_hub

Browse files

Files changed (13) hide show

1_Pooling/config.json +10 -0
README.md +174 -0
config.json +24 -0
config_sentence_transformers.json +10 -0
model.safetensors +3 -0
modules.json +14 -0
role_model.pt +3 -0
sentence_bert_config.json +4 -0
special_tokens_map.json +51 -0
tokenizer.json +0 -0
tokenizer_config.json +72 -0
vocab.txt +0 -0
wrapper_config.json +1 -0

1_Pooling/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "word_embedding_dimension": 768,
+  "pooling_mode_cls_token": false,
+  "pooling_mode_mean_tokens": true,
+  "pooling_mode_max_tokens": false,
+  "pooling_mode_mean_sqrt_len_tokens": false,
+  "pooling_mode_weightedmean_tokens": false,
+  "pooling_mode_lasttoken": false,
+  "include_prompt": true
+}

README.md ADDED Viewed

	@@ -0,0 +1,174 @@

+---
+tags:
+- ontology-embedding
+- hyperbolic-space
+- hierarchical-reasoning
+- biomedical-ontology
+- generated_from_trainer
+- dataset_size:150000
+- loss:HierarchyTransformerLoss
+base_model: sentence-transformers/all-mpnet-base-v2
+widget:
+- source_sentence: cellular response to stimulus
+  sentences:
+  - response to stimulus
+  - medial transverse frontopolar gyrus
+  - biological regulation
+- source_sentence: regulation of cell differentiation involved in embryonic placenta
+    development
+  sentences:
+  - thoracic wall
+  - ectoderm-derived structure
+  - regulation of cell differentiation
+- source_sentence: regulation of hippocampal neuron apoptotic process
+  sentences:
+  - external genitalia morphogenesis
+  - compact layer of ventricle
+  - biological regulation
+- source_sentence: transitional myocyte of internodal tract
+  sentences:
+  - secretory epithelial cell
+  - internodal tract myocyte
+  - insect haltere disc
+- source_sentence: alveolar atrium
+  sentences:
+  - organ part
+  - superior recess of lesser sac
+  - foramen of skull
+pipeline_tag: sentence-similarity
+library_name: sentence-transformers
+---
+# OnT: Language Models as Ontology Encoders
+This is an OnT (Ontology Transformer) model trained on the GO dataset, based on [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2). OnT is a language model-based framework for ontology embeddings, enabling effective representation of concepts as points in hyperbolic space and axioms as hierarchical relationships between concepts.
+## Model Details
+### Model Description
+- **Model Type:** Ontology Transformer (OnT)
+- **Base model:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
+- **Training Dataset:** GO
+- **Maximum Sequence Length:** 384 tokens
+- **Output Dimensionality:** 768 dimensions
+- **Embedding Space:** Hyperbolic Space
+- **Key Features:**
+  - Hyperbolic embeddings for ontology concept encoding
+  - Modeling of hierarchical relationships between concepts
+  - Support for role embeddings as rotations over hyperbolic spaces
+  - Concept rotation, transition, and existential quantifier representation
+### Model Sources
+- **Repository:** [OnT on GitHub](https://github.com/HuiYang1997/OnT)
+- **Paper:** [Language Models as Ontology Encoders](https://arxiv.org/abs/2507.14334)
+### Available Versions
+This model is available in **4 versions** (Git branches) to suit different use cases:
+| Branch | Training Type | Role Embedding | Use Case |
+|--------|------------|----------------|----------|
+| **`main`** (default) | Prediction Dataset | ✅ With role embedding | Default version: training on prediction dataset, support role embedding |
+| **`role-free`** | Prediction Dataset | ❌ Without role embedding | Training on prediction dataset, without role embedding |
+| **`inference-default`** | Inference Dataset | ✅ With role embedding | Training on inference dataset, with role support |
+| **`inference-role-free`** | Inference Dataset | ❌ Without role embedding | Training on inference dataset, without role embeddings |
+**How to use different versions:**
+```python
+from OnT import OntologyTransformer
+# Default version (main branch - OnTr with role embedding)
+ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go")
+# Role-free version (without role embedding)
+ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="role-free")
+# Inference version with role embedding
+ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="inference-default")
+# Inference version without role embedding
+ont = OntologyTransformer.from_pretrained("Hui97/OnT-MPNet-go", revision="inference-role-free")
+```
+### Full Model Architecture
+```
+OntologyTransformer(
+  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: BertModel
+  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
+)
+```
+## Usage
+### Installation
+First, install the required dependencies:
+```bash
+pip install sentence-transformers==3.4.0.dev0
+```
+You also need to install [HierarchyTransformers](https://github.com/KRR-Oxford/HierarchyTransformers) following the instructions in their repository.
+### Direct Usage
+Load the model and use it for ontology concept encoding:
+```python
+import torch
+from OnT import OntologyTransformer
+# Load the OnT model
+path = "Hui97/OnT-MPNet-go"
+ont = OntologyTransformer.from_pretrained(path)
+# Entity names to be encoded
+entity_names = [
+    'alveolar atrium',
+    'organ part',
+    'superior recess of lesser sac',
+]
+# Get the entity embeddings in hyperbolic space
+entity_embeddings = ont.encode_concept(entity_names)
+print(entity_embeddings.shape)
+# [3, 768]
+# Role sentences to be encoded
+role_sentences = [
+    "application attribute",
+    "attribute",
+    "chemical modifier"
+]
+# Get the role embeddings (rotations and scalings)
+role_rotations, role_scalings = ont.encode_roles(role_sentences)
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+## Citation
+### BibTeX
+If you use this model, please cite:
+```bibtex
+@article{yang2025language,
+  title={Language Models as Ontology Encoders},
+  author={Yang, Hui and Chen, Jiaoyan and He, Yuan and Gao, Yongsheng and Horrocks, Ian},
+  journal={arXiv preprint arXiv:2507.14334},
+  year={2025}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,24 @@

+{
+  "_name_or_path": "sentence-transformers/all-mpnet-base-v2",
+  "architectures": [
+    "MPNetModel"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "bos_token_id": 0,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 768,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 514,
+  "model_type": "mpnet",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "relative_attention_num_buckets": 32,
+  "torch_dtype": "float32",
+  "transformers_version": "4.45.2",
+  "vocab_size": 30527
+}

config_sentence_transformers.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "__version__": {
+    "sentence_transformers": "3.4.0.dev0",
+    "transformers": "4.45.2",
+    "pytorch": "2.5.1+cu124"
+  },
+  "prompts": {},
+  "default_prompt_name": null,
+  "similarity_fn_name": "cosine"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:37d583ab2805955adc75eaba6ac6cac412ba0ae9b1bf491acde94b64643d7d27
+size 437967672

modules.json ADDED Viewed

	@@ -0,0 +1,14 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "1_Pooling",
+    "type": "sentence_transformers.models.Pooling"
+  }
+]

role_model.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d986604b5ff3ba99ac133811cf84d2f3e6d033eb13d2205aaefdf640e00c84e2
+size 1185770

sentence_bert_config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "max_seq_length": 256,
+  "do_lower_case": false
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,72 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "104": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "30526": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "<s>",
+  "do_lower_case": true,
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "max_length": 128,
+  "model_max_length": 256,
+  "pad_to_multiple_of": null,
+  "pad_token": "<pad>",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "</s>",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "MPNetTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

wrapper_config.json ADDED Viewed

	@@ -0,0 +1 @@


1	+ {"role_emd_mode": "sentenceEmbedding", "role_model_mode": "rotation"}