mpnet-step2 / README.md
suhwan3's picture
Upload fine-tuned model
27bb250 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:128997
  - loss:MultipleNegativesRankingLoss
base_model: suhwan3/mpnet_step1
widget:
  - source_sentence: >-
      The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P
      500 Risk Managed Income Index by investing at least 80% of its assets in
      index securities. The index's strategy involves holding the underlying
      stocks of the S&P 500 Index while applying an options collar, specifically
      selling at-the-money covered call options and buying monthly 5%
      out-of-the-money put options corresponding to the portfolio's value. This
      approach aims to generate income, ideally resulting in a net credit from
      the options premiums, and provide risk management, though selling
      at-the-money calls inherently caps the fund's potential for upside
      participation.
    sentences:
      - >-
        Nasdaq, Inc. operates as a technology company that serves capital
        markets and other industries worldwide. The Market Technology segment
        includes anti financial crime technology business, which offers Nasdaq
        Trade Surveillance, a SaaS solution for brokers and other market
        participants to assist them in complying with market rules, regulations,
        and internal market surveillance policies; Nasdaq Automated
        Investigator, a cloud-deployed anti-money laundering tool; and Verafin,
        a SaaS technology provider of anti-financial crime management solutions.
        This segment also handles assets, such as cash equities, equity
        derivatives, currencies, interest-bearing securities, commodities,
        energy products, and digital currencies. The Investment Intelligence
        segment sells and distributes historical and real-time market data;
        develops and licenses Nasdaq-branded indexes and financial products; and
        provides investment insights and workflow solutions. The Corporate
        Platforms segment operates listing platforms; and offers investor
        relations intelligence and governance solutions. As of December 31,
        2021, it had 4,178 companies listed securities on The Nasdaq Stock
        Market, including 1,632 listings on The Nasdaq Global Select Market;
        1,169 on The Nasdaq Global Market; and 1,377 on The Nasdaq Capital
        Market. The Market Services segment includes equity derivative trading
        and clearing, cash equity trading, fixed income and commodities trading
        and clearing, and trade management service businesses. This segment
        operates various exchanges and other marketplace facilities across
        various asset classes, which include derivatives, commodities, cash
        equity, debt, structured products, and exchange traded products; and
        provides broker, clearing, settlement, and central depository services.
        The company was formerly known as The NASDAQ OMX Group, Inc. and changed
        its name to Nasdaq, Inc. in September 2015. Nasdaq, Inc. was founded in
        1971 and is headquartered in New York, New York.
      - >-
        Jabil Inc. provides manufacturing services and solutions worldwide. The
        company operates in two segments, Electronics Manufacturing Services and
        Diversified Manufacturing Services. It offers electronics design,
        production, and product management services. The company provides
        electronic design services, such as application-specific integrated
        circuit design, firmware development, and rapid prototyping services;
        and designs plastic and metal enclosures that include the
        electro-mechanics, such as the printed circuit board assemblies (PCBA).
        It also specializes in the three-dimensional mechanical design
        comprising the analysis of electronic, electro-mechanical, and optical
        assemblies, as well as offers various industrial design, mechanism
        development, and tooling management services. In addition, the company
        provides computer-assisted design services consisting of PCBA design, as
        well as PCBA design validation and verification services; and other
        consulting services, such as the generation of a bill of materials,
        approved vendor list, and assembly equipment configuration for various
        PCBA designs. Further, it offers product and process validation
        services, such as product system, product safety, regulatory compliance,
        and reliability tests, as well as manufacturing test solution
        development services. Additionally, the company provides systems
        assembly, test, direct-order fulfillment, and configure-to-order
        services. It serves 5G, wireless and cloud, digital print and retail,
        industrial and semi-cap, networking and storage, automotive and
        transportation, connected devices, healthcare and packaging, and
        mobility industries. The company was formerly known as Jabil Circuit,
        Inc. and changed its name to Jabil Inc. in June 2017. Jabil Inc. was
        founded in 1966 and is headquartered in Saint Petersburg, Florida.
      - >-
        Realty Income, The Monthly Dividend Company, is an S&P 500 company
        dedicated to providing stockholders with dependable monthly income. The
        company is structured as a REIT, and its monthly dividends are supported
        by the cash flow from over 6,500 real estate properties owned under
        long-term lease agreements with our commercial clients. To date, the
        company has declared 608 consecutive common stock monthly dividends
        throughout its 52-year operating history and increased the dividend 109
        times since Realty Income's public listing in 1994 (NYSE: O). The
        company is a member of the S&P 500 Dividend Aristocrats index.
        Additional information about the company can be obtained from the
        corporate website at www.realtyincome.com.
  - source_sentence: >-
      The iShares U.S. Telecommunications ETF (IYZ) seeks to track the
      investment results of the Russell 1000 Telecommunications RIC 22.5/45
      Capped Index, which measures the performance of the U.S.
      telecommunications sector of the U.S. equity market as defined by FTSE
      Russell. This market-cap-weighted index includes large-cap companies
      involved in telecom equipment and service provision and is subject to
      regulatory capping that limits single holdings to 22.5% and aggregate
      large holdings to 45%. The fund generally invests at least 80% of its
      assets in the component securities of its underlying index and is
      non-diversified; the underlying index is rebalanced quarterly.
    sentences:
      - >-
        Kanzhun Limited operates an online recruitment platform, BOSS Zhipin in
        the People's Republic of China. Its recruitment platform assists the
        recruitment process between job seekers and employers for enterprises,
        and corporations. The company was founded in 2013 and is headquartered
        in Beijing, the People's Republic of China.
      - >-
        Frontier Communications Parent, Inc., together with its subsidiaries,
        provides communications services for consumer and business customers in
        25 states in the United States. It offers data and Internet, voice,
        video, and other services. The company was formerly known as Frontier
        Communications Corporation and changed its name to Frontier
        Communications Parent, Inc. in April 2021. Frontier Communications
        Parent, Inc. was incorporated in 1935 and is based in Norwalk,
        Connecticut.
      - >-
        Broadcom Inc. designs, develops, and supplies various semiconductor
        devices with a focus on complex digital and mixed signal complementary
        metal oxide semiconductor based devices and analog III-V based products
        worldwide. The company operates in two segments, Semiconductor Solutions
        and Infrastructure Software. It provides set-top box system-on-chips
        (SoCs); cable, digital subscriber line, and passive optical networking
        central office/consumer premise equipment SoCs; wireless local area
        network access point SoCs; Ethernet switching and routing merchant
        silicon products; embedded processors and controllers;
        serializer/deserializer application specific integrated circuits;
        optical and copper, and physical layers; and fiber optic transmitter and
        receiver components. The company also offers RF front end modules,
        filters, and power amplifiers; Wi-Fi, Bluetooth, and global positioning
        system/global navigation satellite system SoCs; custom touch
        controllers; serial attached small computer system interface, and
        redundant array of independent disks controllers and adapters;
        peripheral component interconnect express switches; fiber channel host
        bus adapters; read channel based SoCs; custom flash controllers;
        preamplifiers; and optocouplers, industrial fiber optics, and motion
        control encoders and subsystems. Its products are used in various
        applications, including enterprise and data center networking, home
        connectivity, set-top boxes, broadband access, telecommunication
        equipment, smartphones and base stations, data center servers and
        storage systems, factory automation, power generation and alternative
        energy systems, and electronic displays. Broadcom Inc. was incorporated
        in 2018 and is headquartered in San Jose, California.
  - source_sentence: >-
      The Xtrackers MSCI Emerging Markets ESG Leaders Equity ETF tracks an index
      of large- and mid-cap emerging market stocks that emphasize strong
      environmental, social, and governance (ESG) characteristics. The index
      first excludes companies involved in specific controversial industries.
      From the remaining universe, it ranks stocks based on MSCI ESG scores,
      including a controversy component, to identify and select the
      highest-ranking ESG leaders, effectively screening out ESG laggards. To
      maintain market-like country and sector weights, the index selects the top
      ESG-scoring stocks within each sector until a specified market
      capitalization threshold is reached. Selected stocks are then weighted by
      market capitalization within their respective sectors. The fund typically
      invests over 80% of its assets in the securities of this underlying index.
    sentences:
      - >-
        Info Edge (India) Limited operates as an online classifieds company in
        the areas of recruitment, matrimony, real estate, and education and
        related services in India and internationally. It operates through
        Recruitment Solutions, 99acres, and Other segments. The company offers
        recruitment services through naukri.com, an online job website for job
        seekers and corporate customers, including hiring consultants;
        firstnaukri.com, a job search network for college students and recent
        graduates; naukrigulf.com, a website catering to Gulf markets; and
        quadranglesearch.com, a site that provides off-line placement services
        to middle and senior management, as well as Highorbit/iimjobs.com,
        zwayam.com, hirist.com, doselect.com, ambitionbox.com, bigshyft.com, and
        jobhai.com. It also provides 99acres.com, which offers listing of
        properties for sale, purchase, and rent; Jeevansathi.com, an online
        matrimonial classifieds services; and shiksha.com, an education
        classified website that helps students to decide their undergraduate and
        postgraduate options by providing useful information on careers, exams,
        colleges, and courses, as well as operates multiple dating platforms on
        the web through its mobile apps Aisle, Anbe, Arike and HeyDil. In
        addition, the company provides internet, computer, and electronic and
        related services; and software development, consultancy, technical
        support for consumer companies, SAAS providers, and other services in
        the field of information technology and product development, as well as
        brokerage services in the real estate sector. Further, it acts as an
        investment adviser and manager, financial and management consultant, and
        sponsor of alternative investment funds, as well as provides advertising
        space for colleges and universities on www.shiksha.com. Info Edge
        (India) Limited was incorporated in 1995 and is based in Noida, India.
      - >-
        China Overseas Land & Investment Limited, an investment holding company,
        engages in the property development and investment, and other operations
        in the People's Republic of China and the United Kingdom. The company
        operates through Property Development, Property Investment, and Other
        Operations segments. It is involved in the investment, development, and
        rental of residential and commercial properties; issuance of guaranteed
        notes and corporate bonds; and hotel operation activities. The company
        also provides construction and building design consultancy services. In
        addition, it engages in the investment and financing, land
        consolidation, regional planning, engineering construction, industrial
        import, commercial operation, and property management. Further, the
        company offers urban services, including office buildings, flexible
        working space, shopping malls, star-rated hotels, long-term rental
        apartments, logistics parks, and architectural design and construction.
        The company was founded in 1979 and is based in Central, Hong Kong.
        China Overseas Land & Investment Limited is a subsidiary of China
        Overseas Holdings Limited.
      - >-
        Mastercard Incorporated, a technology company, provides transaction
        processing and other payment-related products and services in the United
        States and internationally. It facilitates the processing of payment
        transactions, including authorization, clearing, and settlement, as well
        as delivers other payment-related products and services. The company
        offers integrated products and value-added services for account holders,
        merchants, financial institutions, businesses, governments, and other
        organizations, such as programs that enable issuers to provide consumers
        with credits to defer payments; prepaid programs and management
        services; commercial credit and debit payment products and solutions;
        and payment products and solutions that allow its customers to access
        funds in deposit and other accounts. It also provides value-added
        products and services comprising cyber and intelligence solutions for
        parties to transact, as well as proprietary insights, drawing on
        principled use of consumer, and merchant data services. In addition, the
        company offers analytics, test and learn, consulting, managed services,
        loyalty, processing, and payment gateway solutions for e-commerce
        merchants. Further, it provides open banking and digital identity
        platforms services. The company offers payment solutions and services
        under the MasterCard, Maestro, and Cirrus. Mastercard Incorporated was
        founded in 1966 and is headquartered in Purchase, New York.
  - source_sentence: >-
      The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P
      500 Risk Managed Income Index by investing at least 80% of its assets in
      index securities. The index's strategy involves holding the underlying
      stocks of the S&P 500 Index while applying an options collar, specifically
      selling at-the-money covered call options and buying monthly 5%
      out-of-the-money put options corresponding to the portfolio's value. This
      approach aims to generate income, ideally resulting in a net credit from
      the options premiums, and provide risk management, though selling
      at-the-money calls inherently caps the fund's potential for upside
      participation.
    sentences:
      - >-
        Incyte Corporation, a biopharmaceutical company, focuses on the
        discovery, development, and commercialization of proprietary
        therapeutics in the United States and internationally. The company
        offers JAKAFI, a drug for the treatment of myelofibrosis and
        polycythemia vera; PEMAZYRE, a fibroblast growth factor receptor kinase
        inhibitor that act as oncogenic drivers in various liquid and solid
        tumor types; and ICLUSIG, a kinase inhibitor to treat chronic myeloid
        leukemia and philadelphia-chromosome positive acute lymphoblastic
        leukemia. Its clinical stage products include ruxolitinib, a
        steroid-refractory chronic graft-versus-host-diseases (GVHD);
        itacitinib, which is in Phase II/III clinical trial to treat naive
        chronic GVHD; and pemigatinib for treating bladder cancer,
        cholangiocarcinoma, myeloproliferative syndrome, and tumor agnostic. In
        addition, the company engages in developing Parsaclisib, which is in
        Phase II clinical trial for follicular lymphoma, marginal zone lymphoma,
        and mantel cell lymphoma. Additionally, it develops Retifanlimab that is
        in Phase II clinical trials for MSI-high endometrial cancer, merkel cell
        carcinoma, and anal cancer, as well as in Phase II clinical trials for
        patients with non-small cell lung cancer. It has collaboration
        agreements with Novartis International Pharmaceutical Ltd.; Eli Lilly
        and Company; Agenus Inc.; Calithera Biosciences, Inc; MacroGenics, Inc.;
        Merus N.V.; Syros Pharmaceuticals, Inc.; Innovent Biologics, Inc.; Zai
        Lab Limited; Cellenkos, Inc.; and Nimble Therapeutics, as well as
        clinical collaborations with MorphoSys AG and Xencor, Inc. to
        investigate the combination of tafasitamab, plamotamab, and lenalidomide
        in patients with relapsed or refractory diffuse large B-cell lymphoma,
        and relapsed or refractory follicular lymphoma. The company was
        incorporated in 1991 and is headquartered in Wilmington, Delaware.
      - >-
        Omnicom Group Inc., together with its subsidiaries, provides
        advertising, marketing, and corporate communications services. It
        provides a range of services in the areas of advertising, customer
        relationship management, public relations, and healthcare. The company's
        services include advertising, branding, content marketing, corporate
        social responsibility consulting, crisis communications, custom
        publishing, data analytics, database management, digital/direct
        marketing, digital transformation, entertainment marketing, experiential
        marketing, field marketing, financial/corporate business-to-business
        advertising, graphic arts/digital imaging, healthcare marketing and
        communications, and in-store design services. Its services also comprise
        interactive marketing, investor relations, marketing research, media
        planning and buying, merchandising and point of sale, mobile marketing,
        multi-cultural marketing, non-profit marketing, organizational
        communications, package design, product placement, promotional
        marketing, public affairs, retail marketing, sales support, search
        engine marketing, shopper marketing, social media marketing, and sports
        and event marketing services. It operates in the United States, Canada,
        Puerto Rico, South America, Mexico, Europe, the Middle East, Africa,
        Australia, Greater China, India, Japan, Korea, New Zealand, Singapore,
        and other Asian countries. The company was incorporated in 1944 and is
        based in New York, New York.
      - >-
        NetApp, Inc. provides cloud-led and data-centric services to manage and
        share data on-premises, and private and public clouds worldwide. It
        operates in two segments, Hybrid Cloud and Public Could. The company
        offers intelligent data management software, such as NetApp ONTAP,
        NetApp Snapshot, NetApp SnapCenter Backup Management, NetApp SnapMirror
        Data Replication, NetApp SnapLock Data Compliance, NetApp ElementOS
        software, and NetApp SANtricity software; and storage infrastructure
        solutions, including NetApp All-Flash FAS series, NetApp Fabric Attached
        Storage, NetApp FlexPod, NetApp E/EF series, NetApp StorageGRID, and
        NetApp SolidFire. It also provides cloud storage and data services
        comprising NetApp Cloud Volumes ONTAP, Azure NetApp Files, Amazon FSx
        for NetApp ONTAP, NetApp Cloud Volumes Service for Google Cloud, NetApp
        Cloud Sync, NetApp Cloud Tiering, NetApp Cloud Backup, NetApp Cloud Data
        Sense, and NetApp Cloud Volumes Edge Cache; and cloud operations
        services, such as NetApp Cloud Insights, Spot Ocean Kubernetes Suite,
        Spot Security, Spot Eco, and Spot CloudCheckr. In addition, the company
        offers application-aware data management service under the NetApp Astra
        name; and professional and support services, such as strategic
        consulting, professional, managed, and support services. Further, it
        provides assessment, design, implementation, and migration services. The
        company serves the energy, financial service, government, technology,
        internet, life science, healthcare service, manufacturing, media,
        entertainment, animation, video postproduction, and telecommunication
        markets through a direct sales force and an ecosystem of partners.
        NetApp, Inc. was incorporated in 1992 and is headquartered in San Jose,
        California.
  - source_sentence: >-
      The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P
      500 Risk Managed Income Index by investing at least 80% of its assets in
      index securities. The index's strategy involves holding the underlying
      stocks of the S&P 500 Index while applying an options collar, specifically
      selling at-the-money covered call options and buying monthly 5%
      out-of-the-money put options corresponding to the portfolio's value. This
      approach aims to generate income, ideally resulting in a net credit from
      the options premiums, and provide risk management, though selling
      at-the-money calls inherently caps the fund's potential for upside
      participation.
    sentences:
      - >-
        Walgreens Boots Alliance, Inc. operates as a pharmacy-led health and
        beauty retail company. It operates through two segments, the United
        States and International. The United States segment sells prescription
        drugs and an assortment of retail products, including health, wellness,
        beauty, personal care, consumable, and general merchandise products
        through its retail drugstores. It also provides central specialty
        pharmacy services and mail services. As of August 31, 2021, this segment
        operated 8,965 retail stores under the Walgreens and Duane Reade brands
        in the United States; and five specialty pharmacies. The International
        segment sells prescription drugs; and health and wellness, beauty,
        personal care, and other consumer products through its pharmacy-led
        health and beauty retail stores and optical practices, as well as
        through boots.com and an integrated mobile application. It also engages
        in pharmaceutical wholesaling and distribution business in Germany. As
        of August 31, 2021, this segment operated 4,031 retail stores under the
        Boots, Benavides, and Ahumada in the United Kingdom, Thailand, Norway,
        the Republic of Ireland, the Netherlands, Mexico, and Chile; and 548
        optical practices, including 160 on a franchise basis. Walgreens Boots
        Alliance, Inc. was founded in 1901 and is based in Deerfield, Illinois.
      - >-
        Middlesex Water Company owns and operates regulated water utility and
        wastewater systems. It operates in two segments, Regulated and
        Non-Regulated. The Regulated segment collects, treats, and distributes
        water on a retail and wholesale basis to residential, commercial,
        industrial, and fire protection customers, as well as provides regulated
        wastewater systems in New Jersey and Delaware. The Non-Regulated segment
        provides non-regulated contract services for the operation and
        maintenance of municipal and private water and wastewater systems in New
        Jersey and Delaware. The company was incorporated in 1896 and is
        headquartered in Iselin, New Jersey.
      - >-
        Liberty Broadband Corporation engages in the communications businesses.
        It operates through GCI Holdings and Charter segments. The GCI Holdings
        segment provides a range of wireless, data, video, voice, and managed
        services to residential customers, businesses, governmental entities,
        and educational and medical institutions primarily in Alaska under the
        GCI brand. The Charter segment offers subscription-based video services
        comprising video on demand, high-definition television, and digital
        video recorder service; local and long-distance calling, voicemail, call
        waiting, caller ID, call forwarding, and other voice services, as well
        as international calling services; and Spectrum TV. It also provides
        internet services, including an in-home Wi-Fi product that provides
        customers with high-performance wireless routers and managed Wi-Fi
        services; advanced community Wi-Fi; mobile internet; and a security
        suite that offers protection against computer viruses and spyware. In
        addition, this segment offers internet access, data networking, fiber
        connectivity to cellular towers and office buildings, video
        entertainment, and business telephone services; advertising services on
        cable television networks and digital outlets; and operates regional
        sports and news networks. Liberty Broadband Corporation was incorporated
        in 2014 and is based in Englewood, Colorado.
datasets:
  - hobbang/stage2-dataset
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on suhwan3/mpnet_step1

This is a sentence-transformers model finetuned from suhwan3/mpnet_step1 on the stage2-dataset dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: suhwan3/mpnet_step1
  • Maximum Sequence Length: 384 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation.",
    'Walgreens Boots Alliance, Inc. operates as a pharmacy-led health and beauty retail company. It operates through two segments, the United States and International. The United States segment sells prescription drugs and an assortment of retail products, including health, wellness, beauty, personal care, consumable, and general merchandise products through its retail drugstores. It also provides central specialty pharmacy services and mail services. As of August 31, 2021, this segment operated 8,965 retail stores under the Walgreens and Duane Reade brands in the United States; and five specialty pharmacies. The International segment sells prescription drugs; and health and wellness, beauty, personal care, and other consumer products through its pharmacy-led health and beauty retail stores and optical practices, as well as through boots.com and an integrated mobile application. It also engages in pharmaceutical wholesaling and distribution business in Germany. As of August 31, 2021, this segment operated 4,031 retail stores under the Boots, Benavides, and Ahumada in the United Kingdom, Thailand, Norway, the Republic of Ireland, the Netherlands, Mexico, and Chile; and 548 optical practices, including 160 on a franchise basis. Walgreens Boots Alliance, Inc. was founded in 1901 and is based in Deerfield, Illinois.',
    'Liberty Broadband Corporation engages in the communications businesses. It operates through GCI Holdings and Charter segments. The GCI Holdings segment provides a range of wireless, data, video, voice, and managed services to residential customers, businesses, governmental entities, and educational and medical institutions primarily in Alaska under the GCI brand. The Charter segment offers subscription-based video services comprising video on demand, high-definition television, and digital video recorder service; local and long-distance calling, voicemail, call waiting, caller ID, call forwarding, and other voice services, as well as international calling services; and Spectrum TV. It also provides internet services, including an in-home Wi-Fi product that provides customers with high-performance wireless routers and managed Wi-Fi services; advanced community Wi-Fi; mobile internet; and a security suite that offers protection against computer viruses and spyware. In addition, this segment offers internet access, data networking, fiber connectivity to cellular towers and office buildings, video entertainment, and business telephone services; advertising services on cable television networks and digital outlets; and operates regional sports and news networks. Liberty Broadband Corporation was incorporated in 2014 and is based in Englewood, Colorado.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

stage2-dataset

  • Dataset: stage2-dataset at cd393c2
  • Size: 128,997 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 101 tokens
    • mean: 143.15 tokens
    • max: 186 tokens
    • min: 35 tokens
    • mean: 238.69 tokens
    • max: 384 tokens
  • Samples:
    anchor positive
    The Invesco Financial Preferred ETF (PGF) seeks to track the ICE Exchange-Listed Fixed Rate Financial Preferred Securities Index, primarily by investing at least 90% of its total assets in the securities comprising the index. The underlying index is market capitalization weighted and designed to track the performance of exchange-listed, fixed rate, U.S. dollar denominated preferred securities, including functionally equivalent instruments, issued by U.S. financial companies. PGF provides a concentrated portfolio exclusively focused on financial-sector preferred securities and is considered non-diversified, holding both investment- and non-investment-grade securities within this focus. JPMorgan Chase & Co. operates as a financial services company worldwide. It operates through four segments: Consumer & Community Banking (CCB), Corporate & Investment Bank (CIB), Commercial Banking (CB), and Asset & Wealth Management (AWM). The CCB segment offers s deposit, investment and lending products, payments, and services to consumers; lending, deposit, and cash management and payment solutions to small businesses; mortgage origination and servicing activities; residential mortgages and home equity loans; and credit card, auto loan, and leasing services. The CIB segment provides investment banking products and services, including corporate strategy and structure advisory, and equity and debt markets capital-raising services, as well as loan origination and syndication; payments and cross-border financing; and cash and derivative instruments, risk management solutions, prime brokerage, and research. This segment also offers securities services, including custody, fund accounting ...
    The Invesco Financial Preferred ETF (PGF) seeks to track the ICE Exchange-Listed Fixed Rate Financial Preferred Securities Index, primarily by investing at least 90% of its total assets in the securities comprising the index. The underlying index is market capitalization weighted and designed to track the performance of exchange-listed, fixed rate, U.S. dollar denominated preferred securities, including functionally equivalent instruments, issued by U.S. financial companies. PGF provides a concentrated portfolio exclusively focused on financial-sector preferred securities and is considered non-diversified, holding both investment- and non-investment-grade securities within this focus. JPMorgan Chase & Co. operates as a financial services company worldwide. It operates through four segments: Consumer & Community Banking (CCB), Corporate & Investment Bank (CIB), Commercial Banking (CB), and Asset & Wealth Management (AWM). The CCB segment offers s deposit, investment and lending products, payments, and services to consumers; lending, deposit, and cash management and payment solutions to small businesses; mortgage origination and servicing activities; residential mortgages and home equity loans; and credit card, auto loan, and leasing services. The CIB segment provides investment banking products and services, including corporate strategy and structure advisory, and equity and debt markets capital-raising services, as well as loan origination and syndication; payments and cross-border financing; and cash and derivative instruments, risk management solutions, prime brokerage, and research. This segment also offers securities services, including custody, fund accounting ...
    The Invesco Financial Preferred ETF (PGF) seeks to track the ICE Exchange-Listed Fixed Rate Financial Preferred Securities Index, primarily by investing at least 90% of its total assets in the securities comprising the index. The underlying index is market capitalization weighted and designed to track the performance of exchange-listed, fixed rate, U.S. dollar denominated preferred securities, including functionally equivalent instruments, issued by U.S. financial companies. PGF provides a concentrated portfolio exclusively focused on financial-sector preferred securities and is considered non-diversified, holding both investment- and non-investment-grade securities within this focus. The Allstate Corporation, together with its subsidiaries, provides property and casualty, and other insurance products in the United States and Canada. The company operates through Allstate Protection; Protection Services; Allstate Health and Benefits; and Run-off Property-Liability segments. The Allstate Protection segment offers private passenger auto and homeowners insurance; other personal lines products; and commercial lines products under the Allstate and Encompass brand names. The Protection Services segment provides consumer product protection plans and related technical support for mobile phones, consumer electronics, furniture, and appliances; finance and insurance products, including vehicle service contracts, guaranteed asset protection waivers, road hazard tire and wheel, and paint and fabric protection; towing, jump-start, lockout, fuel delivery, and tire change services; device and mobile data collection services; data and analytic solutions using automotive telematics i...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

stage2-dataset

  • Dataset: stage2-dataset at cd393c2
  • Size: 16,944 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 135 tokens
    • mean: 149.21 tokens
    • max: 214 tokens
    • min: 42 tokens
    • mean: 252.75 tokens
    • max: 384 tokens
  • Samples:
    anchor positive
    The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation. Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, and HomePod. It also provides AppleCare support and cloud services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts, as well as advertising services include third-party licensing arrangements and its own advertising platforms. In addition, the company offers various subscription-based services, such as Apple Arcade, a game subscription service; Apple Fitness+, a personalized fitness service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a subscription ...
    The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation. Microsoft Corporation develops, licenses, and supports software, services, devices, and solutions worldwide. The company operates in three segments: Productivity and Business Processes, Intelligent Cloud, and More Personal Computing. The Productivity and Business Processes segment offers Office, Exchange, SharePoint, Microsoft Teams, Office 365 Security and Compliance, Microsoft Viva, and Skype for Business; Skype, Outlook.com, OneDrive, and LinkedIn; and Dynamics 365, a set of cloud-based and on-premises business solutions for organizations and enterprise divisions. The Intelligent Cloud segment licenses SQL, Windows Servers, Visual Studio, System Center, and related Client Access Licenses; GitHub that provides a collaboration platform and code hosting service for developers; Nuance provides healthcare and enterprise AI solutions; and Azure, a cloud platform. It also offers enterprise support, Microsoft consulting, and nuance professional services to assist customers in developing, de...
    The Global X S&P 500 Risk Managed Income ETF seeks to track the Cboe S&P 500 Risk Managed Income Index by investing at least 80% of its assets in index securities. The index's strategy involves holding the underlying stocks of the S&P 500 Index while applying an options collar, specifically selling at-the-money covered call options and buying monthly 5% out-of-the-money put options corresponding to the portfolio's value. This approach aims to generate income, ideally resulting in a net credit from the options premiums, and provide risk management, though selling at-the-money calls inherently caps the fund's potential for upside participation. NVIDIA Corporation provides graphics, and compute and networking solutions in the United States, Taiwan, China, and internationally. The company's Graphics segment offers GeForce GPUs for gaming and PCs, the GeForce NOW game streaming service and related infrastructure, and solutions for gaming platforms; Quadro/NVIDIA RTX GPUs for enterprise workstation graphics; vGPU software for cloud-based visual and virtual computing; automotive platforms for infotainment systems; and Omniverse software for building 3D designs and virtual worlds. Its Compute & Networking segment provides Data Center platforms and systems for AI, HPC, and accelerated computing; Mellanox networking and interconnect solutions; automotive AI Cockpit, autonomous driving development agreements, and autonomous vehicle solutions; cryptocurrency mining processors; Jetson for robotics and other embedded platforms; and NVIDIA AI Enterprise and other software. The company's products are used in gaming, professional visualizat...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 32
  • learning_rate: 3e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • dataloader_drop_last: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss
0.0050 10 4.6656 -
0.0099 20 4.4733 -
0.0149 30 4.0093 -
0.0199 40 3.9259 -
0.0248 50 3.8315 -
0.0298 60 3.673 -
0.0347 70 3.5076 -
0.0397 80 3.4416 -
0.0447 90 3.4362 -
0.0496 100 3.3934 -
0.0546 110 3.3343 -
0.0596 120 3.3018 -
0.0645 130 3.2882 -
0.0695 140 3.3027 -
0.0744 150 3.2177 -
0.0794 160 3.2708 -
0.0844 170 3.2645 -
0.0893 180 3.1939 -
0.0943 190 3.0575 -
0.0993 200 3.0799 -
0.1042 210 3.0824 -
0.1092 220 3.0693 -
0.1141 230 3.1014 -
0.1191 240 3.0458 -
0.1241 250 3.04 -
0.1290 260 3.0311 -
0.1340 270 2.9778 -
0.1390 280 3.0701 -
0.1439 290 2.9039 -
0.1489 300 3.0449 2.5685
0.1538 310 2.8896 -
0.1588 320 3.0527 -
0.1638 330 3.0153 -
0.1687 340 2.869 -
0.1737 350 2.9678 -
0.1787 360 2.9756 -
0.1836 370 2.9348 -
0.1886 380 2.9967 -
0.1935 390 2.8953 -
0.1985 400 2.9546 -
0.2035 410 2.9919 -
0.2084 420 2.8487 -
0.2134 430 2.7609 -
0.2184 440 2.9126 -
0.2233 450 2.8991 -
0.2283 460 2.9272 -
0.2333 470 2.9084 -
0.2382 480 2.7963 -
0.2432 490 2.822 -
0.2481 500 2.9376 -
0.2531 510 2.8969 -
0.2581 520 2.7745 -
0.2630 530 2.8103 -
0.2680 540 2.8189 -
0.2730 550 2.8322 -
0.2779 560 2.7627 -
0.2829 570 2.7796 -
0.2878 580 2.8515 -
0.2928 590 2.8758 -
0.2978 600 2.7963 2.4142
0.3027 610 2.8259 -
0.3077 620 2.829 -
0.3127 630 2.7699 -
0.3176 640 2.7311 -
0.3226 650 2.735 -
0.3275 660 2.7306 -
0.3325 670 2.7467 -
0.3375 680 2.7494 -
0.3424 690 2.7386 -
0.3474 700 2.8513 -
0.3524 710 2.673 -
0.3573 720 2.8101 -
0.3623 730 2.7527 -
0.3672 740 2.7213 -
0.3722 750 2.753 -
0.3772 760 2.8034 -
0.3821 770 2.8288 -
0.3871 780 2.613 -
0.3921 790 2.7315 -
0.3970 800 2.8077 -
0.4020 810 2.7442 -
0.4069 820 2.7351 -
0.4119 830 2.7643 -
0.4169 840 2.8984 -
0.4218 850 2.7377 -
0.4268 860 2.7021 -
0.4318 870 2.6756 -
0.4367 880 2.7852 -
0.4417 890 2.7531 -
0.4467 900 2.6636 2.3456
0.4516 910 2.7089 -
0.4566 920 2.8029 -
0.4615 930 2.721 -
0.4665 940 2.5606 -
0.4715 950 2.6397 -
0.4764 960 2.6563 -
0.4814 970 2.7163 -
0.4864 980 2.6225 -
0.4913 990 2.645 -
0.4963 1000 2.6576 -
0.5012 1010 2.7019 -
0.5062 1020 2.7195 -
0.5112 1030 2.7242 -
0.5161 1040 2.6729 -
0.5211 1050 2.7637 -
0.5261 1060 2.677 -
0.5310 1070 2.7018 -
0.5360 1080 2.6469 -
0.5409 1090 2.7186 -
0.5459 1100 2.6728 -
0.5509 1110 2.6694 -
0.5558 1120 2.7839 -
0.5608 1130 2.5834 -
0.5658 1140 2.6905 -
0.5707 1150 2.7223 -
0.5757 1160 2.7235 -
0.5806 1170 2.636 -
0.5856 1180 2.6314 -
0.5906 1190 2.5941 -
0.5955 1200 2.7827 2.2911
0.6005 1210 2.6104 -
0.6055 1220 2.6148 -
0.6104 1230 2.6355 -
0.6154 1240 2.6269 -
0.6203 1250 2.6003 -
0.6253 1260 2.6256 -
0.6303 1270 2.6326 -
0.6352 1280 2.681 -
0.6402 1290 2.5776 -
0.6452 1300 2.7528 -
0.6501 1310 2.6076 -
0.6551 1320 2.5784 -
0.6600 1330 2.6064 -
0.6650 1340 2.5757 -
0.6700 1350 2.5851 -
0.6749 1360 2.6007 -
0.6799 1370 2.5674 -
0.6849 1380 2.6984 -
0.6898 1390 2.6202 -
0.6948 1400 2.6729 -
0.6998 1410 2.6683 -
0.7047 1420 2.6355 -
0.7097 1430 2.6033 -
0.7146 1440 2.6834 -
0.7196 1450 2.6597 -
0.7246 1460 2.6298 -
0.7295 1470 2.6232 -
0.7345 1480 2.5672 -
0.7395 1490 2.5139 -
0.7444 1500 2.6248 2.3090
0.7494 1510 2.6417 -
0.7543 1520 2.6197 -
0.7593 1530 2.6911 -
0.7643 1540 2.5542 -
0.7692 1550 2.6584 -
0.7742 1560 2.6182 -
0.7792 1570 2.6301 -
0.7841 1580 2.5629 -
0.7891 1590 2.5965 -
0.7940 1600 2.5722 -
0.7990 1610 2.5835 -
0.8040 1620 2.5901 -
0.8089 1630 2.6055 -
0.8139 1640 2.6019 -
0.8189 1650 2.6421 -
0.8238 1660 2.6049 -
0.8288 1670 2.5351 -
0.8337 1680 2.6158 -
0.8387 1690 2.5994 -
0.8437 1700 2.5816 -
0.8486 1710 2.5848 -
0.8536 1720 2.6138 -
0.8586 1730 2.5811 -
0.8635 1740 2.5933 -
0.8685 1750 2.5869 -
0.8734 1760 2.5464 -
0.8784 1770 2.6842 -
0.8834 1780 2.6312 -
0.8883 1790 2.5621 -
0.8933 1800 2.6103 2.2858

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.1.0+cu118
  • Accelerate: 1.6.0
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}