aashish1904 commited on
Commit
c856889
1 Parent(s): 3e4ff2a

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ pipeline_tag: text-generation
5
+ language:
6
+ - multilingual
7
+ inference: false
8
+ license: cc-by-nc-4.0
9
+ library_name: transformers
10
+
11
+ ---
12
+
13
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
14
+
15
+
16
+ # QuantFactory/reader-lm-1.5b-GGUF
17
+ This is quantized version of [jinaai/reader-lm-1.5b](https://huggingface.co/jinaai/reader-lm-1.5b) created using llama.cpp
18
+
19
+ # Original Model Card
20
+
21
+
22
+ <br><br>
23
+
24
+ <p align="center">
25
+ <img src="https://aeiljuispo.cloudimg.io/v7/https://cdn-uploads.huggingface.co/production/uploads/603763514de52ff951d89793/AFoybzd5lpBQXEBrQHuTt.png?w=200&h=200&f=face" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
26
+ </p>
27
+
28
+ <p align="center">
29
+ <b>Trained by <a href="https://jina.ai/"><b>Jina AI</b></a>.</b>
30
+ </p>
31
+
32
+
33
+ # Intro
34
+
35
+ Jina Reader-LM is a series of models that convert HTML content to Markdown content, which is useful for content conversion tasks. The model is trained on a curated collection of HTML content and its corresponding Markdown content.
36
+
37
+ # Models
38
+
39
+ | Name | Context Length | Download |
40
+ |-----------------|-------------------|-----------------------------------------------------------------------|
41
+ | reader-lm-0.5b | 256K | [🤗 Hugging Face](https://huggingface.co/jinaai/reader-lm-0.5b) |
42
+ | reader-lm-1.5b | 256K | [🤗 Hugging Face](https://huggingface.co/jinaai/reader-lm-1.5b) |
43
+ | |
44
+
45
+ # Evaluation
46
+
47
+ TBD
48
+
49
+ # Quick Start
50
+
51
+ To use this model, you need to install `transformers`:
52
+
53
+ ```bash
54
+ pip install transformers<=4.43.4
55
+ ```
56
+
57
+ Then, you can use the model as follows:
58
+
59
+ ```python
60
+ # pip install transformers
61
+ from transformers import AutoModelForCausalLM, AutoTokenizer
62
+ checkpoint = "jinaai/reader-lm-1.5b"
63
+
64
+ device = "cuda" # for GPU usage or "cpu" for CPU usage
65
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
66
+ model = AutoModelForCausalLM.from_pretrained(checkpoint).to(device)
67
+
68
+ # example html content
69
+ html_content = "<html><body><h1>Hello, world!</h1></body></html>"
70
+
71
+ messages = [{"role": "user", "content": html_content}]
72
+ input_text=tokenizer.apply_chat_template(messages, tokenize=False)
73
+
74
+ print(input_text)
75
+
76
+ inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
77
+ outputs = model.generate(inputs, max_new_tokens=1024, temperature=0, do_sample=False, repetition_penalty=1.08)
78
+
79
+ print(tokenizer.decode(outputs[0]))
80
+ ```
81
+