liamcripwell
commited on
Commit
•
8521c8e
1
Parent(s):
b5f53bc
Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,7 @@ base_model: Qwen/Qwen2.5-0.5B
|
|
15 |
|
16 |
# NuExtract-tiny-v1.5 by NuMind 🔥
|
17 |
|
18 |
-
NuExtract-v1.5 is a fine-tuning of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B), trained on a private high-quality dataset for structured information extraction. It supports long documents and several languages (English, French, Spanish, German, Portuguese, and Italian).
|
19 |
To use the model, provide an input text and a JSON template describing the information you need to extract.
|
20 |
|
21 |
Note: This model is trained to prioritize pure extraction, so in most cases all text generated by the model is present as is in the original text.
|
@@ -58,7 +58,7 @@ def predict_NuExtract(model, tokenizer, texts, template, batch_size=1, max_lengt
|
|
58 |
|
59 |
return [output.split("<|output|>")[1] for output in outputs]
|
60 |
|
61 |
-
model_name = "numind/NuExtract-v1.5"
|
62 |
device = "cuda"
|
63 |
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device).eval()
|
64 |
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|
|
|
15 |
|
16 |
# NuExtract-tiny-v1.5 by NuMind 🔥
|
17 |
|
18 |
+
NuExtract-tiny-v1.5 is a fine-tuning of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B), trained on a private high-quality dataset for structured information extraction. It supports long documents and several languages (English, French, Spanish, German, Portuguese, and Italian).
|
19 |
To use the model, provide an input text and a JSON template describing the information you need to extract.
|
20 |
|
21 |
Note: This model is trained to prioritize pure extraction, so in most cases all text generated by the model is present as is in the original text.
|
|
|
58 |
|
59 |
return [output.split("<|output|>")[1] for output in outputs]
|
60 |
|
61 |
+
model_name = "numind/NuExtract-tiny-v1.5"
|
62 |
device = "cuda"
|
63 |
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device).eval()
|
64 |
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|