Model description
This model is a fine-tuned model of intfloat/multilingual-e5-large
, trained with Indonesian police news data.
How to use this model:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("faizaulia/e5-fine-tune-polri-news-emotion")
model = AutoModelForSequenceClassification.from_pretrained("faizaulia/e5-fine-tune-polri-news-emotion")
Label description:
0: Angry, 1: Fear, 2: Sad, 3: Neutral, 4: Happy, 5: Love
Input text example:
LAMPUNG, KOMPAS.com - Komplotan perampok yang menyekap satu keluarga di Kabupaten Lampung Timur ditembak aparat kepolisian. Komplotan ini menggondol uang sebanyak Rp 50 juta milik korban. Kapolres Lampung Timur, AKBP M Rizal Muchtar mengatakan, tiga dari empat pelaku ini telah ditangkap pada Senin (27/2/2023) dini hari.
Preprocesssing:
nltk.download('stopwords')
nltk.download('wordnet')
stop_words = set(stopwords.words('indonesian'))
def remove_stopwords(text):
words = text.split()
words = [word for word in words if word not in stop_words]
return ' '.join(words)
def clean_texts(text):
text = re.sub('\n',' ',text) # Remove every '\n'
text = re.sub(' +', ' ', text) # Remove extra spaces
text = re.sub('[\u2013\u2014]', '-', text) # Sub — and – char to -
text = re.sub('(.{0,40})-', '', text) # Remove news website/location at the beginning
text = re.sub(r'[^a-zA-Z\s]', '', text) # Remove non alphanbet characters
return text
def preprocess_text(text):
text = text.lower()
text = clean_texts(text)
text = remove_stopwords(text)
return text
- Downloads last month
- 3,405
This model does not have enough activity to be deployed to Inference API (serverless) yet.
Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.