oscarwang2 commited on
Commit
33025c8
1 Parent(s): 07edf6f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - causal-lm
6
+ - transformers
7
+ - llama
8
+ - reflex-ai
9
+ ---
10
+
11
+ # AMD Llama 350M Upgraded
12
+
13
+ ## Model Description
14
+
15
+ The **AMD Llama 350M Upgraded** is a transformer-based causal language model built on the Llama architecture, designed to generate human-like text. This model has been upgraded from the original AMD Llama model to provide enhanced performance with an increased parameter count of 500 million. It is suitable for various natural language processing tasks, including text generation, completion, and conversational applications.
16
+
17
+ ## Model Details
18
+
19
+ - **Model Type**: Causal Language Model
20
+ - **Architecture**: Llama
21
+ - **Number of Parameters**: 322 million
22
+ - **Input Size**: Variable-length input sequences
23
+ - **Output Size**: Variable-length output sequences
24
+
25
+ ## Usage
26
+
27
+ To use the AMD Llama 500M Upgraded model, you can utilize the `transformers` library. Here’s a sample code snippet to get started:
28
+
29
+ ```python
30
+ import torch
31
+ from transformers import LlamaForCausalLM, LlamaTokenizer
32
+
33
+ # Load the tokenizer and model
34
+ model_name = "reflex-ai/AMD-Llama-500M-Upgraded"
35
+ tokenizer = LlamaTokenizer.from_pretrained(model_name)
36
+ model = LlamaForCausalLM.from_pretrained(model_name)
37
+
38
+ # Set the model to evaluation mode
39
+ model.eval()
40
+
41
+ # Function to generate text
42
+ def generate_text(prompt, max_length=50):
43
+ inputs = tokenizer.encode(prompt, return_tensors='pt', padding=True, truncation=True)
44
+ attention_mask = (inputs != tokenizer.pad_token_id).long()
45
+
46
+ if torch.cuda.is_available():
47
+ inputs = inputs.to('cuda')
48
+ attention_mask = attention_mask.to('cuda')
49
+
50
+ with torch.no_grad():
51
+ outputs = model.generate(inputs, attention_mask=attention_mask, max_length=max_length, num_return_sequences=1)
52
+
53
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
54
+ return generated_text
55
+
56
+ # Example usage
57
+ prompt = "Once upon a time in a land far away,"
58
+ generated_output = generate_text(prompt, max_length=100)
59
+ print(generated_output)