metadata

library_name: setfit
tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
metrics:
  - accuracy
widget:
  - text: ' '
  - text: quantitative algorithmic hustle trading dot com
  - text: cryptoart since early 2020  founder of ENCODE_graphics red_heart EARTH
  - text: >-
      Chief Legal Officer krakenfx Not your lawyer Assumptions opinions
      prevarications and predictions are mine not my employers 
  - text: >-
      Chief of Staff at Remilia Corporation remiliacorp333 Warlord Commander at
      YAYO Corporation YayoCorp THIS IS NOT A PROMISE OF EQUITY OR OWNERSHIP IN
      ANYTHING 
pipeline_tag: text-classification
inference: true
base_model: BAAI/bge-small-en-v1.5
model-index:
  - name: SetFit with BAAI/bge-small-en-v1.5
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          name: Unknown
          type: unknown
          split: test
        metrics:
          - type: accuracy
            value: 0.4891640866873065
            name: Accuracy

SetFit with BAAI/bge-small-en-v1.5

This is a SetFit model that can be used for Text Classification. This SetFit model uses BAAI/bge-small-en-v1.5 as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: BAAI/bge-small-en-v1.5
Classification head: a LogisticRegression instance
Maximum Sequence Length: 512 tokens
Number of Classes: 27 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
NFT_ARTIST	'Onchain music cooprecordsxyz invinmusic cooprecsmusic Los Angeles' 'Artist etc Chicago' 'Where NFTs meet DeFi on SecretNetwork Are you Legendary nfts gamefi defi SCRT LGND Secret Network'
UNDETERMINED	'100 readerfunded writer All works free to republish bootleg use or cp All works coauthored by Tim Foley Patreon Australia' 'Pro skater father husband videogame character CEO philanthropist public skatepark advocate Old AF and still skating San Diego world at large' 'Sandy and blue '
DEVELOPER	'BUIDL HODL CELR CelerNetwork BNB bnb48club BTC ETC ATOM No gods no masters only hashes Shadowy Coder trolling by day coding by night Metalhead Cosmos' 'crosspost to Farcaster Bluesky Twitter Lens Threads decentralized social in one feed iOS ' 'VP of Engineering at avara space nerd my astrobin profile Opinions strictly my own '
EXECUTIVE	'Epic Games founder and CEO ' 'CEO compoundfinance San Francisco USA' 'DeFi FounderUnited_States bornraised Bahai Los Angeles'
INFLUENCER	'That rug really tied the room together weekly news channel on hiatus ' 'Conecto con personas de habla hispana con perfil propio dentro de bitcoin y comparto su valor preguntasbtc para respuestas en 24h lunaticoingetalbycom 2515 4D1C 6C36 C024' 'sols 学习新事物不带偏见的去看币圈 macau'
BUSINESS_DEVELOPER	'Chief of Growth fuel_network Ex0xMantle Ex0xPolygon Tea connoisseur Dog Lover Web3 Degen Views are my own Metaverse' 'Experience Bitcoin like never before ' 'Bitcoin Blockchain bitcoinmining Bitpay Founded the Original WomenInBitcoin etc Los Angeles '
TRADER	'your greater fool goblin town' 'Commander in chiefing Periodically 1 ranked trader on ByBit Zaza City' 'Ethereum Maximalist Synthetix Spartan '
ONCHAIN_ANALYST	'technical and onchain analyst crypto stock real estate investor global head of news beincrypto spreading alpha United States' 'Cofounder reflexivityres acq by defitechglobal Using velodata New York NY' 'Cofounder ensuser research y2z_ventures Full time onchain moron ethereum'
RESEARCHER	'Lets skip witty repartee discuss fundamental questions Views are mine not GMUs or Virginias Books Fairfax VA' 'research paradigm ' 'Crypto Data Research 0x'
INVESTOR	'peer to peer electronic cash enthusiast light__nh hethey ' 'GoldLover ' 'I enjoy business innovation lifelong learning to ChangeTheWorld to help others Entrepreneur interim CxO investor adventurer thinker doer NO DMs Global citizen'
SECURITY_AUDITOR	'security researcher nascentsecurity EVM Enthusiast Gas Optimizoor Puzzle Cracker Fan of all things Static Analysis Fuzzing Symbolic Execution ' 'think bad do good cofounder openpathsec los angeles' 'Head of GTM CyfrinAudits Ex Lead Dev Rel AlchemyPlatform Created cyfrinupdraft and AlchemyLearn Making web3 mainstream Ethereum'
EDUCATOR	'Professor of Practice at Harvard Teaches Ec 10 some tweets might be educational Also Senior Fellow PIIE Was Chair of President Obamas CEA Cambridge MA' 'Jarrête des carrières Je vulgarise et décortique à vos côtés les nouvelles tokenomics et les influvoleurs crypto de notre époque Bitcoin' ' Bretton Woods NH'
LAWYER	'Author of Digital Money Demystified DickinsonLaw AdvantageEvans AtTechIntersect Crypto IP Law As seen on Coindesk TV Yahoo Finance Bloomberg CNBC Nomad Team ' 'UVa Vanderbilt Law Your guide to other worlds Crypto l Metaverse Web3 Not legal or financial advice I am A lawyerjust not YOUR lawyer USA' 'The Crypto Lawyer Юрист Rechtsanwältin محامية Advising Entrepreneurs Investors and Governments on Bitcoin Crypto since 2016 Contributor Forbes UAE Switzerland '
ADVISOR	'Director of Government Relations at BlockchainAssn Author of the Token Taxonomy Act Former WarrenDavidson and Board of Advisors JoinSeedstarter Washington DC' 'Calculated Degen 2x cancer survivor 5x rug pull survivor Paper hands diamond wrist Building WumboLabs Advisoooor arcade_xyz ' 'Doggfather Analytics Founderorange_square OrdData '
COMMUNITY_MANAGER	'Contributor to the Optimism Collective OP' 'ecosystem growth indexcoop music NFT enjoyer wavWRLD_ not financial advice typos are my owm ' 'Founder KryptoSeoul ericaplanet Ericaverse Organizer buidl_asia eth_seoul_ Seoul Chapter Lead She__Fi Alum Stanford Ewha Where is Erica'
MARKETER	'Positivity Pusher CoFounder PurpleHorizons Future Tech Marketing Strategist Trend Spotter Storyteller Once a DJ Always a DJ Miami FL' 'elissa emm Head of Marketing at spruceid building decentralized identity Seattle WA' 'Marketing Superfluid_HQ Safaryclub solhotgirlclub She__Fi Cohort 9 Words in banklesshq PFP miladymaker 104 NFA Views are my own Brooklyn NY'
ANGEL_INVESTOR	'Developer entrepreneur angel investor crypto enthusiast ' 'larp LawliettesLab angel uvocapital ' 'cofounder jokerace_io ecodao_ write on web3 angel thecowfund berlin'
VENTURE_CAPITALIST	'visionary at core playful at the surface just launched GetCohosts WalkinEvents prev fabric_vc nothingnyc London UK New York USA' 'Crypto web3 Partner ColliderVC Standing on the shoulders of giants World State' 'startup investor and builder founder w_conviction before GP greylockVC accelerating AI adoption tech podcastchains'
NFT_COLLECTOR	'FINE BITCOIN GOODS Get in THE BANTER Scarce City' 'Cofounder RKOTax omega based spicymargeth ' 'Like a shadow following the light Time is actually another dimension Nikennftyeth Niken32lens bcard id 275 Multiverse of Madness'
BLOGGER	'Reporter at Bloomberg business covering crypto blockchain companies Formerly CoinDesk DM open Opinions are my own New York USA' 'viamirror ' 'senior writer NFT lead BanklessHQ '
METAVERSE_ENTHUSIAST	'SMOL by Treasure_DAO Smolverse' 'Epic SciFi MMO strategy game from Pixelmatic ExordiumHQ Take command of a fleet of spaceships and fight for humanity NOW Sol System' 'Time to post tweets and save lives Creative Director PlayShadowWar Where dreams come true'
FINANCIAL_ANALYST	'Editor of FTAlphaville Norwegian despite the Harry Potteresque name Author of TRILLIONS Views mine bla bla Oslo Norway' 'markets macro business anchor of 10am ET and ETF IQ on bloombergtv haverfordedu columbiajourn alum ktkaos on InstaThreads Opinions mine Midtown East Manhattan' 'Curious on how behavioral fallacies challenge financial markets and cryptos Always learning new things getting to know new people and having a bit of fun London England'
DATA_SCIENTIST	' ' 'Data Wizardry variantfund Chicago IL' 'NLP ML StatArb Math Bowdoincollege Team Doobro_CN Prev first intern Bybit_Official Plucking a feather from every goose but follow no one absolute New York NY'
NODE_OPERATOR	'Founder ClayStack_HQ Building Liquid Staking long before they were called LSDs Running validator nodes at Vibing ClayClanDAO Metaverse' 'Restake ETH Never Worry about EigenLayer Caps EigenLayer' 'HonigdachsPod cohost Making Bitcoin green today netposmon Find me on nostr npub1cear2n95zcyze86s5hry2a0pdgs7euhnc0p7ewcq2284pp845t5szt8rhr '
SHITCOINER	'Eternity belongs to those who live in the present I tweet once per week when Im pooping Results in occasional shitposting ' 'Lets hold hands and be enemies enemieswithbenefitseth ' '16th Chair of the Central Bank of Retards When I see chaos forming on the timeline I rush in to shitpost adding fuel to the fire Hyperbolic Time Chamber'
MINER	' bitcoin beyonder economic futurist metagame winner rose' 'Steady lads deploying more hashrate Hashrate merchant luxortechnology btc Miami' 'SVP foundryservices I am a miner like my father before me previously greenidge_GREE Bitcoin '
DATA_ANALYST	'Director of Research at proof_xyz Building charts that make NFTs a bit easier to understand ' 'Shadowy mediocre Analyst tangent_xyz ' 'Lead Analyst CryptoSlate Previously Saidler Bitcoin London'

Evaluation

Metrics

Label	Accuracy
all	0.4892

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("kasparas12/crypto_individual_infer_model_setfit")
# Run inference
preds = model(" ")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	2	13.6494	65

Label	Training Sample Count
DEVELOPER	702
DATA_SCIENTIST	34
DATA_ANALYST	8
NODE_OPERATOR	18
MINER	22
SECURITY_AUDITOR	129
INVESTOR	212
ANGEL_INVESTOR	84
VENTURE_CAPITALIST	467
TRADER	168
SHITCOINER	34
BUSINESS_DEVELOPER	306
BUSINESS_ANALYST	0
COMMUNITY_MANAGER	122
MARKETER	70
FINANCIAL_ANALYST	32
ADVISOR	79
RESEARCHER	227
ONCHAIN_ANALYST	29
EXECUTIVE	393
INFLUENCER	510
LAWYER	47
BLOGGER	55
NFT_COLLECTOR	174
NFT_ARTIST	312
EDUCATOR	134
METAVERSE_ENTHUSIAST	57
UNDETERMINED	740

Training Hyperparameters

batch_size: (64, 64)
num_epochs: (1, 1)
max_steps: -1
sampling_strategy: oversampling
num_iterations: 20
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0011	1	0.2537	-
0.0562	50	0.2412	-
0.1124	100	0.2242	-
0.1685	150	0.2066	-
0.2247	200	0.1811	-
0.2809	250	0.205	-
0.3371	300	0.1789	-
0.3933	350	0.1831	-
0.4494	400	0.1829	-
0.5056	450	0.1506	-
0.5618	500	0.1474	-
0.6180	550	0.0989	-
0.6742	600	0.1094	-
0.7303	650	0.1316	-
0.7865	700	0.1207	-
0.8427	750	0.1262	-
0.8989	800	0.1229	-
0.9551	850	0.0989	-
0.0003	1	0.2061	-
0.0155	50	0.2073	-
0.0310	100	0.1844	-
0.0465	150	0.1891	-
0.0619	200	0.1975	-
0.0774	250	0.1772	-
0.0929	300	0.2304	-
0.1084	350	0.2085	-
0.1239	400	0.1851	-
0.1394	450	0.1463	-
0.1548	500	0.1216	-
0.1703	550	0.1648	-
0.1858	600	0.1359	-
0.2013	650	0.163	-
0.2168	700	0.1563	-
0.2323	750	0.2	-
0.2478	800	0.1425	-
0.2632	850	0.1614	-
0.2787	900	0.1881	-
0.2942	950	0.133	-
0.3097	1000	0.1348	-
0.3252	1050	0.1256	-
0.3407	1100	0.1065	-
0.3561	1150	0.0932	-
0.3716	1200	0.122	-
0.3871	1250	0.0969	-
0.4026	1300	0.1386	-
0.4181	1350	0.1116	-
0.4336	1400	0.0866	-
0.4491	1450	0.084	-
0.4645	1500	0.1073	-
0.4800	1550	0.1065	-
0.4955	1600	0.1063	-
0.5110	1650	0.1235	-
0.5265	1700	0.0918	-
0.5420	1750	0.078	-
0.5574	1800	0.1358	-
0.5729	1850	0.0664	-
0.5884	1900	0.1123	-
0.6039	1950	0.0996	-
0.6194	2000	0.0471	-
0.6349	2050	0.1068	-
0.6504	2100	0.0933	-
0.6658	2150	0.0836	-
0.6813	2200	0.0858	-
0.6968	2250	0.0421	-
0.7123	2300	0.08	-
0.7278	2350	0.0902	-
0.7433	2400	0.0949	-
0.7587	2450	0.116	-
0.7742	2500	0.0733	-
0.7897	2550	0.101	-
0.8052	2600	0.0709	-
0.8207	2650	0.079	-
0.8362	2700	0.0706	-
0.8517	2750	0.0338	-
0.8671	2800	0.0812	-
0.8826	2850	0.063	-
0.8981	2900	0.075	-
0.9136	2950	0.081	-
0.9291	3000	0.1264	-
0.9446	3050	0.0766	-
0.9600	3100	0.0873	-
0.9755	3150	0.0512	-
0.9910	3200	0.0816	-

Framework Versions

Python: 3.9.16
SetFit: 1.0.3
Sentence Transformers: 2.2.2
Transformers: 4.21.3
PyTorch: 1.12.1+cu116
Datasets: 2.4.0
Tokenizers: 0.12.1

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}