YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Quantization made by Richard Erkhov.

Github

Discord

Request more models

PII-Model-Phi3-Mini - bnb 8bits

Original model description:

license: mit language: - en pipeline_tag: text-generation tags: - LLM - token classification - nlp - safetensor - PyTorch base_model: microsoft/Phi-3-mini-4k-instruct library_name: transformers widget: - text: My name is Sylvain and I live in Paris example_title: Parisian - text: My name is Sarah and I live in London example_title: Londoner

PII Detection Model - Phi3 Mini Fine-Tuned

This repository contains a fine-tuned version of the Phi3 Mini model for detecting personally identifiable information (PII). The model has been specifically trained to recognize various PII entities in text, making it a powerful tool for tasks such as data redaction, privacy protection, and compliance with data protection regulations.

Model Overview

Model Architecture

Detected PII Entities

The model is capable of detecting the following PII entities:

  • Personal Information:

    • firstname
    • middlename
    • lastname
    • sex
    • dob (Date of Birth)
    • age
    • gender
    • height
    • eyecolor
  • Contact Information:

    • email
    • phonenumber
    • url
    • username
    • useragent
  • Address Information:

    • street
    • city
    • state
    • county
    • zipcode
    • country
    • secondaryaddress
    • buildingnumber
    • ordinaldirection
  • Geographical Information:

    • nearbygpscoordinate
  • Organizational Information:

    • companyname
    • jobtitle
    • jobarea
    • jobtype
  • Financial Information:

    • accountname
    • accountnumber
    • creditcardnumber
    • creditcardcvv
    • creditcardissuer
    • iban
    • bic
    • currency
    • currencyname
    • currencysymbol
    • currencycode
    • amount
  • Unique Identifiers:

    • pin
    • ssn
    • imei (Phone IMEI)
    • mac (MAC Address)
    • vehiclevin (Vehicle VIN)
    • vehiclevrm (Vehicle VRM)
  • Cryptocurrency Information:

    • bitcoinaddress
    • litecoinaddress
    • ethereumaddress
  • Other Information:

    • ip (IP Address)
    • ipv4
    • ipv6
    • maskednumber
    • password
    • time
    • ordinaldirection
    • prefix

Prompt Format

### Instruction:
  Identify and extract the following PII entities from the text, if present: companyname, pin, currencyname, email, phoneimei, litecoinaddress, currency, eyecolor, street, mac, state, time, vehiclevin, jobarea, date, bic, currencysymbol, currencycode, age, nearbygpscoordinate, amount, ssn, ethereumaddress, zipcode, buildingnumber, dob, firstname, middlename, ordinaldirection, jobtitle, bitcoinaddress, jobtype, phonenumber, height, password, ip, useragent, accountname, city, gender, secondaryaddress, iban, sex, prefix, ipv4, maskednumber, url, username, lastname, creditcardcvv, county, vehiclevrm, ipv6, creditcardissuer, accountnumber, creditcardnumber. Return the output in JSON format.

### Input:
  Greetings, Mason! Let's celebrate another year of wellness on 14/01/1977. Don't miss the event at 176,Apt. 388.

### Output:

Usage

Installation

To use this model, you'll need to have the transformers library installed:

pip install transformers

Run Inference

from transformers import AutoTokenizer, AutoModelForTokenClassification

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("ab-ai/PII-Model-Phi3-Mini")
model = AutoModelForTokenClassification.from_pretrained("ab-ai/PII-Model-Phi3-Mini")


input_text = "Hi Abner, just a reminder that your next primary care appointment is on 23/03/1926. Please confirm by replying to this email Nathen15@hotmail.com."

model_prompt = f"""### Instruction:
    Identify and extract the following PII entities from the text, if present: companyname, pin, currencyname, email, phoneimei, litecoinaddress, currency, eyecolor, street, mac, state, time, vehiclevin, jobarea, date, bic, currencysymbol, currencycode, age, nearbygpscoordinate, amount, ssn, ethereumaddress, zipcode, buildingnumber, dob, firstname, middlename, ordinaldirection, jobtitle, bitcoinaddress, jobtype, phonenumber, height, password, ip, useragent, accountname, city, gender, secondaryaddress, iban, sex, prefix, ipv4, maskednumber, url, username, lastname, creditcardcvv, county, vehiclevrm, ipv6, creditcardissuer, accountnumber, creditcardnumber. Return the output in JSON format.

    ### Input:
    {input_text}

    ### Output: """


inputs = tokenizer(model_prompt, return_tensors="pt").to(device)
# adjust max_new_tokens according to your need
outputs = model.generate(**inputs, do_sample=True, max_new_tokens=120)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response) #{'middlename': ['Abner'], 'dob': ['23/03/1926'], 'email': ['Nathen15@hotmail.com']}
Downloads last month
1
Safetensors
Model size
4B params
Tensor type
F32
F16
I8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support