dictalm-7b-instruct / README.md
Shaltiel's picture
Update README.md
1915b64
|
raw
history blame
4.5 kB
metadata
license: cc-by-4.0
language:
  - he
inference: false

DictaLM: A Large Generative Language Model for Modern Hebrew

A large generative pretrained transformer (GPT) language model for Hebrew, released [link to be added].

This model was fine-tuned for instructions:

  • General questions:

    ืžื” ื–ื” ื‘ื™ืช ืกืคืจ?
    
    ืงื™ื‘ืœืชื™ ื—ืชืš ืงืœ ื‘ืืฆื‘ืข. ืžื”ื™ ื”ื“ืจืš ื”ื ื›ื•ื ื” ืœื˜ืคืœ ื‘ื–ื”?
    
  • Simple tasks:

    ืชืฆื™ืข ื›ืžื” ืจืขื™ื•ื ื•ืช ืœืคืขื™ืœื•ืช ืขื ื™ืœื“ื™ื ื‘ื ื™ 5:
    
  • Information retrieval from a paragraph context:

        ื”ืžืกื™ืง ื”ื™ื“ื ื™ ื”ื•ื ื”ื“ืจืš ื”ืžืกื•ืจืชื™ืช ื•ื”ืขืชื™ืงื” ืœืงื˜ื™ืฃ ื–ื™ืชื™ื. ืฉื™ื˜ื” ื–ื• ื“ื•ืจืฉืช ื›ื•ื— ืื“ื ืจื‘ ื‘ืื•ืคืŸ ื™ื—ืกื™ ื•ืขื“ื™ื™ืŸ ืžืงื•ื‘ืœืช ื‘ื™ืฉืจืืœ ื•ื‘ืžืงื•ืžื•ืช ืจื‘ื™ื ื‘ืขื•ืœื. ืฉื™ื˜ื•ืช ืžืกื™ืง ื™ื“ื ื™ ืžืืคืฉืจื•ืช ื—ื™ืกื›ื•ืŸ ืขืœื•ื™ื•ืช ื‘ืžืงื•ืžื•ืช ื‘ื”ื ื›ื•ื— ื”ืื“ื ื–ื•ืœ ื•ืขืœื•ืช ื”ืฉื™ื˜ื•ืช ื”ืžืžื•ื›ื ื•ืช ื’ื‘ื•ื”ื”. ืœื–ื™ืชื™ื ื”ืžื™ื•ืขื“ื™ื ืœืžืื›ืœ (ืœื›ื‘ื™ืฉื”, ื‘ื ื™ื’ื•ื“ ืœื–ื™ืชื™ื ืœืฉืžืŸ) ืžืชืื™ื ื™ื•ืชืจ ืžืกื™ืง ื™ื“ื ื™ ื›ื™ื•ื•ืŸ ืฉื”ืคืจื™ ืคื—ื•ืช ื ืคื’ืข ื‘ืžื”ืœืš ื”ืžืกื™ืง ื‘ืฉื™ื˜ื” ื–ื• (ืคื’ื™ืขื•ืช ื‘ืงืœื™ืคืช ื”ืคืจื™ ื‘ื–ื™ืชื™ื ืœืฉืžืŸ ืคื—ื•ืช ืžืฉืžืขื•ืชื™ื•ืช). ื›ืžื• ื›ืŸ ืžื•ืขื“ืฃ ืžืกื™ืง ื™ื“ื ื™ ื‘ืื–ื•ืจื™ื ื‘ื”ื ื”ื˜ื•ืคื•ื’ืจืคื™ื” ื”ืžืงื•ืžื™ืช ืื• ืฆืคื™ืคื•ืช ื”ืขืฆื™ื ืœื ืžืืคืฉืจื™ื ื’ื™ืฉื” ื ื•ื—ื” ืœื›ืœื™ื ืžื›ื ื™ื. ื”ืฉื™ื˜ื” ื”ื™ื“ื ื™ืช ืžืืคืฉืจืช ื’ื ืœืžืกื•ืง ืขืฆื™ื ืฉื•ื ื™ื ื‘ืžื•ืขื“ื™ื ืฉื•ื ื™ื, ื‘ื”ืชืื ืœืงืฆื‘ ื”ื‘ืฉืœืช ื”ืคืจื™ ื”ื˜ื‘ืขื™ ื‘ื›ืœ ืขืฅ.
        
        ืขืœ ื‘ืกื™ืก ื”ืคืกืงื” ื”ื–ืืช, ืžื” ื”ื•ื ื”ื™ืชืจื•ืŸ ืฉืœ ืžืกื™ืง ื™ื“ื ื™ ืžื‘ื—ื™ื ืช ืงืฆื‘ ื”ื‘ืฉืœืช ื”ืคืจื™?
    

Sample usage:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained('dicta-il/dictalm-7b-instruct')
# If you don't have cuda installed, remove the `.cuda()` call at the end
model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b-instruct', trust_remote_code=True).cuda()

model.eval()

with torch.inference_mode():
    prompt = 'ืชืฆื™ืข ื›ืžื” ืจืขื™ื•ื ื•ืช ืœืคืขื™ืœื•ืช ืขื ื™ืœื“ื™ื ื‘ื ื™ 5:\n'
    kwargs = dict(
        inputs=tokenizer(prompt, return_tensors='pt').input_ids.to(model.device),
        do_sample=True,
        top_k=50,
        top_p=0.95,
        temperature=0.75,
        max_length=100,
        min_new_tokens=5
    )
    
    print(tokenizer.batch_decode(model.generate(**kwargs), skip_special_tokens=True))

Alternative ways to initialize the model:

If you have multiple smaller GPUs, and the package accelerate is installed, you can initialize the model split across the devices:

model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b-instruct', trust_remote_code=True, device_map='auto')

If you are running on linux and have the bitsandbytes package installed, you can initialize the model in 4/8 bit inference mode:

model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b-instruct', trust_remote_code=True, load_in_8bit=True)

If you have FlashAttention installed in your environment, you can instruct the model to use the flash attention implementation (either V1 or V2, whichever is installed):

model = AutoModelForCausalLM.from_pretrained('dicta-il/dictalm-7b-instruct', trust_remote_code=True, use_flash_attention=True)

There are many different parameters you can input into kwargs for different results (greedy, beamsearch, different samplign configurations, longer/shorter respones, etc.).

You can view the full list of parameters you can pass to the generate function here.

Citation

If you use DictaLM in your research, please cite ADD CITATION HERE

BibTeX:

ADD BIBTEXT HERE

License

Shield: CC BY 4.0

This work is licensed under a Creative Commons Attribution 4.0 International License.

CC BY 4.0