Falconsai
/

topic_change_point

+---
+language:
+- en
+license: apache-2.0
+tags:
+- NLP
+pipeline_tag: summarization
+---
+# Topic Change Point Detection Model
+## Model Details
+- **Model Name:** Falconsai/topic_change_point
+- **Model Type:** Fine-tuned `google/t5-small`
+- **Language:** English
+- **License:** MIT
+## Overview
+The Topic Change Point Detection model is designed to identify topics and track how they change within a block of text. It is based on the google/t5-small model, fine-tuned on a custom dataset that maps texts to their respective topic changes. This model can be used to analyze and categorize texts according to their topics and the transitions between them.
+### Model Architecture
+The base model architecture is T5 (Text-To-Text Transfer Transformer), which treats every NLP problem as a text-to-text problem. The specific version used here is `google/t5-small`, which has been fine-tuned to understand and predict conversation arcs.
+Fine-Tuning Data
+The model was fine-tuned on a dataset consisting of texts and their corresponding topic changes. The dataset should be formatted in a specified file with two columns: text and topic_changes.
+Intended Use
+The model is intended for identifying topics and detecting changes in topics across a block of text. It can be useful for applications in various fields: Psychology/Psychiatry for session assesment (This initial use case), content analysis, document insights, conversational analysis, and other areas where understanding the flow of topics is important.
+## How to Use
+### Inference
+To use this model for inference, you need to load the fine-tuned model and tokenizer. Here is an example of how to do this using the `transformers` library:
+Running Pipeline
+```python
+# Use a pipeline as a high-level helper
+from transformers import pipeline
+text_block = 'Your block of text here.'
+pipe = pipeline("summarization", model="Falconsai/topic_change_point")
+res1 = pipe(convo1, max_length=1024, min_length=512, do_sample=False)
+print(res1)
+```
+Running on CPU
+```python
+# Load model directly
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+tokenizer = AutoTokenizer.from_pretrained("Falconsai/topic_change_point")
+model = AutoModelForSeq2SeqLM.from_pretrained("Falconsai/topic_change_point")
+input_text = 'Your block of text here.'
+input_ids = tokenizer(input_text, return_tensors="pt").input_ids
+outputs = model.generate(input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+Running on GPU
+```python
+# pip install accelerate
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+tokenizer = AutoTokenizer.from_pretrained("Falconsai/topic_change_point")
+model = AutoModelForSeq2SeqLM.from_pretrained("Falconsai/topic_change_point", device_map="auto")
+input_text = 'Your block of text here.'
+input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
+outputs = model.generate(input_ids)
+print(tokenizer.decode(outputs[0]))
+```
+## Training
+The training process involves the following steps:
+1. **Load and Explore Data:** Load the dataset and perform initial exploration to understand the data distribution.
+2. **Preprocess Data:** Tokenize the text block and prepare them for the T5 model.
+3. **Fine-Tune Model:** Fine-tune the `google/t5-small` model using the preprocessed data.
+4. **Evaluate Model:** Evaluate the model's performance on a validation set to ensure it's learning correctly.
+5. **Save Model:** Save the fine-tuned model for future use.
+## Evaluation
+The model's performance should be evaluated on a separate validation set to ensure it accurately predicts the conversation arcs. Metrics such as accuracy, precision, recall, and F1 score can be used to assess its performance.
+## Limitations
+- **Data Dependency:** The model's performance is highly dependent on the quality and representativeness of the training data.
+- **Generalization:** The model may not generalize well to conversation texts that are significantly different from the training data.
+## Ethical Considerations
+When deploying the model, be mindful of the ethical implications, including but not limited to:
+- **Privacy:** Ensure that text data used for training and inference does not contain sensitive or personally identifiable information.
+- **Bias:** Be aware of potential biases in the training data that could affect the model's predictions.
+## License
+This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
+## Citation
+If you use this model in your research, please cite it as follows:
+```
+@misc{topic_change_point,
+  author = {Michael Stattelman},
+  title = {Topic Change Point Detection},
+  year = {2024},
+  publisher = {Falcons.ai},
+}
+```
+---