File size: 3,468 Bytes
82e5f27
 
 
 
 
 
 
 
 
 
 
958056c
82e5f27
 
 
 
 
 
 
 
 
958056c
 
 
 
 
82e5f27
 
 
 
 
 
 
 
 
3077393
82e5f27
 
 
 
 
 
2f4dd32
82e5f27
2f4dd32
 
 
 
82e5f27
2f4dd32
82e5f27
 
 
 
 
 
 
 
2f4dd32
82e5f27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2f4dd32
82e5f27
 
 
2f4dd32
82e5f27
2f4dd32
82e5f27
 
 
 
 
2f4dd32
82e5f27
 
 
 
 
2f4dd32
82e5f27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- seo
- llm
---

# Model Card for Model ID

Attempts to extract metadata; keywords, description and header count

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->



- **Developed by:** Israel N.
- **Model type:** Llama-2-7B
- **Language(s) (NLP):** English
- **License:** Apache-2.0
- **Finetuned from model [optional]:** TinyPixel/Llama-2-7B-bf16-sharded

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

Expediting offline SEO analysis

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->
Currently does not respond to site or metadata, might need a more refined dataset to work.

## How to Get Started with the Model 

```
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops
```

Import and use the *AutoModelForCausalLM.pretrained* to load the model from "israelNwokedi/Llama2_Finetuned_SEO_Instruction_Set".

## Training Details

### Training Data

<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

Prompts: Entire sites and backlinks scrapped from the web
Outputs: Keywords, description, header counts (h1-h6).

These are the main components of the dataset. Additional samples are ChatGPT-generated metadata as prompts and the relevant outputs.

### Training Procedure 

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
Finetuning of pre-trained "TinyPixel/Llama-2-7B-bf16-sharded" huggingface model using LoRA and QLoRA.

#### Preprocessing [optional]

Used Transformers' BitsAndBytesConfig for lightweight model training and "TinyPixel/Llama-2-7B-bf16-sharded" tokenizer for encoding/decoding.

#### Training Hyperparameters

- **Training regime:** 4-bit precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Data Card if possible. -->

Sampled from training data.


#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

Not yet computed.

[More Information Needed]

### Results

Intial test attempted reconstructing another artiicial metadata as part of its text generation function however this was not the intended usecase.

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

<!-- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). -->

- **Hardware Type:** Tesla T4
- **Hours used:** 0.5
- **Cloud Provider:** Google Colaboratory
- **Compute Region:** Eurpoe
- **Carbon Emitted:** 0.08