us-address-matching-model / system_promts.txt
felix
add 32 from gpt4
152fc5b
Create similar addresses:
----------------------------
You are tasked with helping to generate test data for machine learning dataset. Do no prefix numbers before each line of output. User is expected to prompt with a sample US address but without the City, State, Zipcode part. As a model your task is to generate 10 variations of the provided address as a regular person may enter it into some system. All outputs should be a list of 10 sample.
Create similar but different addresses:
You are tasked with helping to generate test data for machine learning dataset. Do no prefix numbers before each line of output. User is expected to prompt with a sample US address but without the City, State, Zipcode part. As a model your task is to generate 10 variations of the provided address that are actually different addresses. Address formats should be as if a regular person may enter it into some system.
Generate completely different addresses.
You are tasked with helping to generate test data for machine learning dataset. Do no prefix numbers before each line of output. User is expected to prompt with a sample US address but without the City, State, Zipcode part. As a model your task is to generate 10 random addresses that are inspired by the structure of the address user provided. Address formats should be as if a regular person may enter it into some system.
Your task is to generate training samples based on an evaluation set provided for a neural network training problem.
The format of the evaluation set as follows.
1. A let of rows separated by newlines.
2. Each row has the following structure:
US address 1|US address 2|1
or
US address 1|US address 2|0
The first and second address are compared to see if they are the same address as may have been typed
by a user into some system. Users cannot alter the format or position of two letter state code or the five digit zipcode.
All the letters are always capitalized so user inputs are always uppercased for purposes of training data.
Given the following evaluation set generate 50 rows of training data that would produce a network with high accuracy but
not so similar that they would cause overfitting to the evaluation set.
Evaluation set:
1061 SCHMIDT LN, NORTH BRUNSWICK TOWNSHIP, NJ 08902|1061 SCHMIDT LANE, NORTH BRUNSWICK TOWNSHIP, NJ 08902|1
1061 SCHMIDT LN, NORTH BRUNSWICK TOWNSHIP, NJ 08902|934 SCHMIDT LN, NORTH BRUNSWICK TOWNSHIP, NJ 08902|0
115 DR JIMMY CARR ST, STE 3F, SEARCY, AR 72143|115 DR JIMMY CARR ST, SEARCY, AR 72143|0
14143 WINECUP LN, HOUSTON, TX 77047|14121 WINECUP LANE, HOUSTON, TX 77047|0
1555 RUTH RD STE 5, NORTH BRUNSWICK TOWNSHIP, NJ 08902|1555 RUTH ROAD SUIT 5, NORTH BRUNSWICK TOWNSHIP, NJ 08902|1
1555 RUTH RD STE 5, NORTH BRUNSWICK TOWNSHIP, NJ 08902|1558 RUTH ROAD STE 5, NORTH BRUNSWICK TOWNSHIP, NJ 08902|0
17752 MAIN ST, HANOVER, PA 23393|177 52 MAIN ST, HANOVER, PA 23393|1
17752 MAIN ST, HANOVER, PA 23393|177 52 MAIN STREET, HANOVER, PA 23393|1
217-12 LOUDON RD, CONCORD, NH 03301|217-22 LOUDON RD, CONCORD, NH 03301|0
2575 US HWY 43, ST 3-A, WINFIELD, AL 35594|25-75 US HWY 43, STREET 3A, WINFIELD, AL 35594|1
2575 US HWY 43, ST 3-A, WINFIELD, AL 35594|2575 US HWY 43, ST 3B, WINFIELD, AL 35594|0
440 TECHNOLOGY CENTER DRIVE, BOSTON, MA 10034|200 TECHNOLOGY CENTER DRIVE, BOSTON, MA 10034|0
440 TECHNOLOGY CENTER DRIVE, BOSTON, MA 10034|440 TECHNOLOGY CENTER DR., BOSTON, MA 10034|1
440 TECHNOLOGY CENTER DRIVE, BOSTON, MA 10034|87 TECHNOLOGY CENTER DRIVE, BOSTON, MA 10034|0
545 16TH ST, STE 3, GULFPORT, MS 39507|545 16TH ST, FLOOR 3, GULFPORT, MS 39507|0
5844 N ORANGE BLOSSOM TRAIL, ORLANDO, FL 32810|5844 NORTH ORANGE BLOSSOM TRAIL, ORLANDO, FL 32810|1
65 MOUNTAIN BLVD EXT, WARREN, NJ 07059|112 MOUNTAIN BLVD EXT, WARREN, NJ 07059|0
65 MOUNTAIN BLVD EXT, WARREN, NJ 07059|5078 S MARYLAND PKWY, LAS VEGAS, NV 89119|0
65 MOUNTAIN BLVD EXT, WARREN, NJ 07059|65 MOUNTAIN BOULEVARD EXT, WARREN, NJ 07059|1
6701 FANNIN ST #1400, HOUSTON, TX 77030|6701 FANNIN STE #1400, HOUSTON, TX 77030|1
87 24 ROUTE 13, CORTLANDVILLE, NY 13045|87-24 ROUTE 13, CORTLANDVILLE, NY 13045|1
87-43 ROUTE 13, CORTLANDVILLE, NY 13045|8724 ROUTE 13, CORTLANDVILLE, NY 13045|0
87-44 ROUTE 13, CORTLANDVILLE, NY 13045|87 24 ROUTE 13, CORTLANDVILLE, NY 13045|0
872 ROUTE 13, CORTLANDVILLE, NY 13045|87-2 ROUTE 13, CORTLANDVILLE ,NY 13045|1
8724 ROUTE 13, CORTLANDVILLE, NY 13045|87-24 ROUTE 13, CORTLANDVILLE, NY 13045|1
HEART HEALTH, 90 N COLUMBUS AVE, LOUISVILLE, MS 39339|90 N COLUMBUS AVE, LOUISVILLE, MS 39339|1
115 34 SHOREWAY DR, QUEENSTOWN, MD 21658|115-43 SHOREWAY DR, QUEENSTOWN, MD 21658|0
112 24 SHOREWAY DR, QUEENSTOWN, MD 21658|112-24 SHOREWAY DR, QUEENSTOWN, MD 21658|1
3619 S 22ND DR, YUMA, AZ 85364|3636 S 22ND DR, YUMA, AZ 85364|0
7325 FRANKLIN BLVD, SACRAMENTO, CA 95823|73235 FRANKLIN BLVD, SACRAMENTO, CA 95823|0
3660 MAIN ST, TUCSON, AZ 85721|3701 MAIN ST, TUCSON, AZ 85721|0
3910 MAGNET RD, MALVERN, AR 72104|3910 MAGNET RD, STE 206 MALVERN, AR 72104|0
15702 OBERLIN RD, RALEIGH, NC 27605|15702 OBERLIN RD FL 1, RALEIGH, NC 27605|1
14425 ROOSOVELT AVE APT 322, LA JOLLA, CA 92092|14325 ROOSOVELT AVE, LA JOLLA, CA 92092|0
14425 ROOSOVELT AVE APT 322, LA JOLLA, CA 92092|144-25 ROOSOVELT AVE APT 322, LA JOLLA, CA 92092|1
14425 ROOSOVELT AVE, LA JOLLA, CA 92092|144-25A ROOSOVELT AVENUE, LA JOLLA, CA 92092|0
Training samples:
good but now instead of varying one part of the address like the building number vary two or more
parts of the address where some parts will have differences like the building number may have one
different character and other variations may be because STREET is spelled STR or include other common
variations that don't actually change the meaning of the address. Remember to generate both
positive and negative pairs that will fit the evaluation set. Generate 50 samples