Edit model card

phi-2-pl-v_0_1

This model is based on microsoft/phi-2. It was trained from scratch on the 20231201 Polish Wikipedia dump.

Model description

The model was trained for a context length of 2048 tokens.

Intended uses & limitations

The model is intended for research purposes only. It may generate fictitious, incorrect, unethical, or biased texts. At its current state, it is not suitable for production purposes.

Example:

tokenizer = AutoTokenizer.from_pretrained(
    model_name, trust_remote_code=True, use_fast=True
)
model = AutoModelForCausalLM.from_pretrained(
    model_name, vocab_size=len(tokenizer), attn_implementation="flash_attention_2",
    trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto"
)
model.eval()

generation_config = GenerationConfig.from_pretrained(
    model_name, do_sample=False, repetition_penalty=1.5,
    min_new_tokens=1, max_new_tokens=128
)

test_input = tokenizer("Wrocław to polski miasto. Wrocław jest ", return_tensors='pt').to(torch.device('cuda'))
test_output = model.generate(**test_input, generation_config=generation_config)
test_preds = tokenizer.batch_decode(sequences=test_output, skip_special_tokens=True, clean_up_tokenization_spaces=True)
print(test_preds)

Training and evaluation data

The 20231201 Polish Wikipedia dump.

Training procedure

Training environment

  • GPU: 1 x A100X (80GB)

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • num_devices: 1
  • train_batch_size: 8
  • gradient_accumulation_steps: 1
  • optimizer: Adam with betas=(0.9,0.98) and epsilon=1e-07
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1
  • precision: bf16
  • seed: 42

Training results

  • runtime: 1mo 3d 9h 40m 16s
  • train_loss: 2.983

Framework versions

  • Transformers 4.37.1
  • Pytorch 2.1.2
  • Datasets 2.16.1
  • Tokenizers 0.15.1
Downloads last month
34
Safetensors
Model size
2.78B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for teddy-f-47/phi-pl-2_7B-v_0_1

Base model

microsoft/phi-2
Finetuned
(283)
this model