File size: 3,468 Bytes
82e5f27 958056c 82e5f27 958056c 82e5f27 3077393 82e5f27 2f4dd32 82e5f27 2f4dd32 82e5f27 2f4dd32 82e5f27 2f4dd32 82e5f27 2f4dd32 82e5f27 2f4dd32 82e5f27 2f4dd32 82e5f27 2f4dd32 82e5f27 2f4dd32 82e5f27 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
tags:
- seo
- llm
---
# Model Card for Model ID
Attempts to extract metadata; keywords, description and header count
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Israel N.
- **Model type:** Llama-2-7B
- **Language(s) (NLP):** English
- **License:** Apache-2.0
- **Finetuned from model [optional]:** TinyPixel/Llama-2-7B-bf16-sharded
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
### Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
Expediting offline SEO analysis
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
Currently does not respond to site or metadata, might need a more refined dataset to work.
## How to Get Started with the Model
```
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops
```
Import and use the *AutoModelForCausalLM.pretrained* to load the model from "israelNwokedi/Llama2_Finetuned_SEO_Instruction_Set".
## Training Details
### Training Data
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
Prompts: Entire sites and backlinks scrapped from the web
Outputs: Keywords, description, header counts (h1-h6).
These are the main components of the dataset. Additional samples are ChatGPT-generated metadata as prompts and the relevant outputs.
### Training Procedure
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
Finetuning of pre-trained "TinyPixel/Llama-2-7B-bf16-sharded" huggingface model using LoRA and QLoRA.
#### Preprocessing [optional]
Used Transformers' BitsAndBytesConfig for lightweight model training and "TinyPixel/Llama-2-7B-bf16-sharded" tokenizer for encoding/decoding.
#### Training Hyperparameters
- **Training regime:** 4-bit precision <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
### Testing Data, Factors & Metrics
#### Testing Data
<!-- This should link to a Data Card if possible. -->
Sampled from training data.
#### Metrics
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
Not yet computed.
[More Information Needed]
### Results
Intial test attempted reconstructing another artiicial metadata as part of its text generation function however this was not the intended usecase.
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
<!-- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). -->
- **Hardware Type:** Tesla T4
- **Hours used:** 0.5
- **Cloud Provider:** Google Colaboratory
- **Compute Region:** Eurpoe
- **Carbon Emitted:** 0.08
|