numind
/

NuExtract-tiny-v1.5

Text Generation

Model card Files Files and versions Community

liamcripwell commited on Sep 26

Commit

8521c8e

•

1 Parent(s): b5f53bc

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ base_model: Qwen/Qwen2.5-0.5B
 # NuExtract-tiny-v1.5 by NuMind 🔥
-NuExtract-v1.5 is a fine-tuning of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B), trained on a private high-quality dataset for structured information extraction. It supports long documents and several languages (English, French, Spanish, German, Portuguese, and Italian).
 To use the model, provide an input text and a JSON template describing the information you need to extract.
 Note: This model is trained to prioritize pure extraction, so in most cases all text generated by the model is present as is in the original text.
@@ -58,7 +58,7 @@ def predict_NuExtract(model, tokenizer, texts, template, batch_size=1, max_lengt
     return [output.split("<|output|>")[1] for output in outputs]
-model_name = "numind/NuExtract-v1.5"
 device = "cuda"
 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device).eval()
 tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

 # NuExtract-tiny-v1.5 by NuMind 🔥
+NuExtract-tiny-v1.5 is a fine-tuning of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B), trained on a private high-quality dataset for structured information extraction. It supports long documents and several languages (English, French, Spanish, German, Portuguese, and Italian).
 To use the model, provide an input text and a JSON template describing the information you need to extract.
 Note: This model is trained to prioritize pure extraction, so in most cases all text generated by the model is present as is in the original text.
     return [output.split("<|output|>")[1] for output in outputs]
+model_name = "numind/NuExtract-tiny-v1.5"
 device = "cuda"
 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True).to(device).eval()
 tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)