YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Quantization made by Richard Erkhov.
PII-Model-Phi3-Mini - bnb 8bits
- Model creator: https://huggingface.co/ab-ai/
- Original model: https://huggingface.co/ab-ai/PII-Model-Phi3-Mini/
Original model description:
license: mit language: - en pipeline_tag: text-generation tags: - LLM - token classification - nlp - safetensor - PyTorch base_model: microsoft/Phi-3-mini-4k-instruct library_name: transformers widget: - text: My name is Sylvain and I live in Paris example_title: Parisian - text: My name is Sarah and I live in London example_title: Londoner
PII Detection Model - Phi3 Mini Fine-Tuned
This repository contains a fine-tuned version of the Phi3 Mini model for detecting personally identifiable information (PII). The model has been specifically trained to recognize various PII entities in text, making it a powerful tool for tasks such as data redaction, privacy protection, and compliance with data protection regulations.
Model Overview
Model Architecture
- Base Model: Phi3 Mini
- Fine-Tuned For: PII detection
- Framework: Hugging Face Transformers
Detected PII Entities
The model is capable of detecting the following PII entities:
Personal Information:
firstnamemiddlenamelastnamesexdob(Date of Birth)agegenderheighteyecolor
Contact Information:
emailphonenumberurlusernameuseragent
Address Information:
streetcitystatecountyzipcodecountrysecondaryaddressbuildingnumberordinaldirection
Geographical Information:
nearbygpscoordinate
Organizational Information:
companynamejobtitlejobareajobtype
Financial Information:
accountnameaccountnumbercreditcardnumbercreditcardcvvcreditcardissueribanbiccurrencycurrencynamecurrencysymbolcurrencycodeamount
Unique Identifiers:
pinssnimei(Phone IMEI)mac(MAC Address)vehiclevin(Vehicle VIN)vehiclevrm(Vehicle VRM)
Cryptocurrency Information:
bitcoinaddresslitecoinaddressethereumaddress
Other Information:
ip(IP Address)ipv4ipv6maskednumberpasswordtimeordinaldirectionprefix
Prompt Format
### Instruction:
Identify and extract the following PII entities from the text, if present: companyname, pin, currencyname, email, phoneimei, litecoinaddress, currency, eyecolor, street, mac, state, time, vehiclevin, jobarea, date, bic, currencysymbol, currencycode, age, nearbygpscoordinate, amount, ssn, ethereumaddress, zipcode, buildingnumber, dob, firstname, middlename, ordinaldirection, jobtitle, bitcoinaddress, jobtype, phonenumber, height, password, ip, useragent, accountname, city, gender, secondaryaddress, iban, sex, prefix, ipv4, maskednumber, url, username, lastname, creditcardcvv, county, vehiclevrm, ipv6, creditcardissuer, accountnumber, creditcardnumber. Return the output in JSON format.
### Input:
Greetings, Mason! Let's celebrate another year of wellness on 14/01/1977. Don't miss the event at 176,Apt. 388.
### Output:
Usage
Installation
To use this model, you'll need to have the transformers library installed:
pip install transformers
Run Inference
from transformers import AutoTokenizer, AutoModelForTokenClassification
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("ab-ai/PII-Model-Phi3-Mini")
model = AutoModelForTokenClassification.from_pretrained("ab-ai/PII-Model-Phi3-Mini")
input_text = "Hi Abner, just a reminder that your next primary care appointment is on 23/03/1926. Please confirm by replying to this email Nathen15@hotmail.com."
model_prompt = f"""### Instruction:
Identify and extract the following PII entities from the text, if present: companyname, pin, currencyname, email, phoneimei, litecoinaddress, currency, eyecolor, street, mac, state, time, vehiclevin, jobarea, date, bic, currencysymbol, currencycode, age, nearbygpscoordinate, amount, ssn, ethereumaddress, zipcode, buildingnumber, dob, firstname, middlename, ordinaldirection, jobtitle, bitcoinaddress, jobtype, phonenumber, height, password, ip, useragent, accountname, city, gender, secondaryaddress, iban, sex, prefix, ipv4, maskednumber, url, username, lastname, creditcardcvv, county, vehiclevrm, ipv6, creditcardissuer, accountnumber, creditcardnumber. Return the output in JSON format.
### Input:
{input_text}
### Output: """
inputs = tokenizer(model_prompt, return_tensors="pt").to(device)
# adjust max_new_tokens according to your need
outputs = model.generate(**inputs, do_sample=True, max_new_tokens=120)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response) #{'middlename': ['Abner'], 'dob': ['23/03/1926'], 'email': ['Nathen15@hotmail.com']}
- Downloads last month
- 1