--- license: mit datasets: - ai4privacy/pii-masking-400k language: - en - de - fr - it - es - nl base_model: - iiiorg/piiranha-v1-detect-personal-information tags: - NeuralWave - Hackathon --- ## Overview This model serves to enhance the precision and accuracy of personal information detection by utilizing a reduced label set compared to its base model. Through this refinement, it aims to provide superior labeling precision for identifying personal information across multiple languages. --- ## Features - **Improved Precision**: By reducing the label set size from the base model, the model enhances the precision of the labeling procedure, ensuring more reliable identification of sensitive information. - **Model Versions**: - **Maximum Accuracy Focus**: This version aims to achieve the highest possible accuracy in the detection process, making it suitable for applications where minimizing errors is crucial. - **Maximum Precision Focus**: This variant is designed to maximize the precision of the detection, ideal for scenarios where false positives are particularly undesirable. --- ## Installation To run this model, you will need to install the dependencies: ```bash pip install torch transformers safetensors ``` --- ## Usage Load and run the model using PyTorch and transformers: ```python from transformers import AutoModelForTokenClassification, AutoConfig, BertTokenizerFast from safetensors.torch import load_file # Load the config config = AutoConfig.from_pretrained("folder_to_model") # Initialize the model with the config model = AutoModelForTokenClassification.from_config(config) # Load the safetensors weights state_dict = load_file("folder_to_tensors") # Load the state dict into the model model.load_state_dict(state_dict) # Load the tokenizer tokenizer = BertTokenizerFast.from_pretrained("google-bert/bert-base-multilingual-cased") # Load the label mapper if needed with open("pii_model/label_mapper.json", 'r') as f: label_mapper_data = json.load(f) label_mapper = LabelMapper() label_mapper.label_to_id = label_mapper_data['label_to_id'] label_mapper.id_to_label = {int(k): v for k, v in label_mapper_data['id_to_label'].items()} label_mapper.num_labels = label_mapper_data['num_labels'] # Process outputs for analysis... ``` --- ## Evaluation - **Accuracy Model**: Focused on minimizing errors, evaluates to achieve the highest accuracy metrics. - **Precision Model**: Designed to minimize false positives, optimizing for precision-driven applications. --- ## Disclaimer The publisher of this repository is not affiliated with Ai4Privacy and Ai Suisse SA ## Honorary Mention This repo created during the Hackaton organized by [NeuralWave](https://neuralwave.ch/#/)