--- license: apache-2.0 language: - en pipeline_tag: text-classification tags: - Sentiment Analysis - Language Models --- # DistilSenti-Net42M: Context Distilled Small Language Model For Sentiment Analysis ## Model Architecture - **Embedding Layer**: Converts input text into dense vectors. - **CNN Layers**: Extracts features from text sequences. - **Vanilla RNN, LSTM**: Capture temporal dependencies in text. - **Dense Layers**: Classify text into sentiment categories. ## Usage You can use this model for sentiment analysis on text data. Here's a sample code to load and use the model: ```python from huggingface_hub import from_pretrained_keras import re import numpy as np from tensorflow.keras.preprocessing.sequence import pad_sequences # Load model model = from_pretrained_keras("Ravinthiran/DistilSenti-Net42M") # Example prediction function def predict_sentiment(text, model, tokenizer, label_encoder): text = text.lower() text = re.sub(r'[^\w\s]', '', text) sequence = tokenizer.texts_to_sequences([text]) padded_sequence = pad_sequences(sequence, maxlen=100) pred = model.predict(padded_sequence) sentiment = label_encoder.inverse_transform(pred.argmax(axis=1)) sentiment_score = pred[0] return sentiment[0], sentiment_score # Example usage new_text = "I recently started a new fitness program at a local wellness center, and it has been an incredibly positive experience." predicted_sentiment, sentiment_score = predict_sentiment(new_text, model, tokenizer, label_encoder) print(f"Predicted Sentiment: {predicted_sentiment}") print(f"Sentiment Scores: {sentiment_score}") ``` ## Using Keras Download DistilSentiNet-42M.keras https://huggingface.co/Ravinthiran/Distilsenti-Net-42M/blob/main/DistilSentiNet-42M.keras ## Using HDFS (H5) Download DistilSentiNet-42M.h5 here: https://huggingface.co/Ravinthiran/DistilSenti-Net42M/blob/main/DistilSentiNet-42M.h5 ```python import numpy as np import pandas as pd import re import matplotlib.pyplot as plt import seaborn as sns from sklearn.preprocessing import LabelEncoder from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences from tensorflow.keras.models import load_model # Load the saved Keras model model_hybrid = load_model('< DistilSentiNet-42M.h5 File Path > or < DistilSentiNet-42M.keras File Path >') # Sample data df = pd.read_csv("") # Preprocessing df['text'] = df['text'].str.lower().str.replace('[^\w\s]', '', regex=True) # Encode labels label_encoder = LabelEncoder() df['label'] = label_encoder.fit_transform(df['sentiment']) # Tokenization and padding tokenizer = Tokenizer(num_words=5000) tokenizer.fit_on_texts(df['text']) X = tokenizer.texts_to_sequences(df['text']) X = pad_sequences(X, maxlen=100) # Function to predict sentiment of new input text def predict_sentiment(text, tokenizer, model): # Preprocess the input text text = text.lower() text = re.sub(r'[^\w\s]', '', text) sequence = tokenizer.texts_to_sequences([text]) padded_sequence = pad_sequences(sequence, maxlen=100) # Predict sentiment pred = model.predict(padded_sequence) sentiment = label_encoder.inverse_transform(pred.argmax(axis=1)) sentiment_score = pred[0] return sentiment[0], sentiment_score # Example usage new_text = "I recently started a new fitness program at a local wellness center, and it has been an incredibly positive experience. The trainers are highly knowledgeable and provide personalized guidance to help me achieve my fitness goals. The facilities are state-of-the-art, with a wide range of equipment and classes to choose from. The supportive community and motivating environment have made working out enjoyable and rewarding. I have already noticed significant improvements in my health and fitness levels, and the positive changes have greatly enhanced my overall well-being." predicted_sentiment, sentiment_score = predict_sentiment(new_text, tokenizer, model_hybrid) print(f"The sentiment of the input text is: {predicted_sentiment} with scores {sentiment_score}") ```