Fingu-instruct-2 / README.md
FINGU-AI's picture
Update README.md
5550143 verified
metadata
base_model: dunzhang/stella_en_1.5B_v5
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:99000
  - loss:MultipleNegativesSymmetricRankingLoss
widget:
  - source_sentence: >-
      Instruct: Given a web search query, retrieve relevant passages that answer
      the query.

      Query: Glay
    sentences:
      - >-
        The Theory of Good and Evil is a 1907 book about ethics by the English
        philosopher Hastings Rashdall, in which the author expounds a theory he
        calls "ideal utilitarianism". It has been seen as Rashdall's most
        important philosophical work.
      - >-
        GLAY is a Japanese rock band , formed in Hakodate in 1988 . Glay
        primarily composes songs in the rock and pop genres , but they have also
        arranged songs using elements from a wide variety of genres , including
        punk , electronic , R&B , progressive rock , folk , reggae , gospel ,
        and ska . Originally a visual kei band , the group slowly shifted to
        less dramatic attire through the years . As of 2008 , Glay had sold an
        estimated 51 million records ; 28 million singles and 23 million albums
        , making them one of the top ten best-selling artists of all time in
        Japan .
      - >-
        Aashirwad is a 1968 Bollywood film , directed by Hrishikesh Mukherjee .
        The film stars Ashok Kumar and Sanjeev Kumar .   The film is notable for
        its inclusion of a rap-like song performed by Ashok Kumar , `` Rail
        Gaadi '' .
  - source_sentence: >-
      Instruct: Given a web search query, retrieve relevant passages that answer
      the query.

      Query: Indexing does not work with index package
    sentences:
      - >-
        I am trying to do indexing with the following code:              
        \documentclass[a4paper]{article}     \usepackage{index}    
        \makeindex     \newindex{aut}{adx}{and}{Name Index}    
        \begin{document}     Hellow \index[aut]{FiRST}     \printindex[aut]    
        \end{document}      Acccording to documention of the `index` package it
        should work. But makeindex creates empty `.idx` and `.ind`. If I run
        code like this:               \documentclass[a4paper]{article}    
        \usepackage{index}     \makeindex     \begin{document}      Hellow
        \index{FiRST}     \printindex     \end{document}      It runs. But I
        need to have user-defined index. Please help me with it. I've searched
        for several hours on internet, but without success.
      - >-
        Body materials may include, but are not limited to, any of these
        materials:
      - >-
        Berberis aemulans is a shrub endemic to the region of Sichuan in
        southern China. It grows there in thickets and on slopes at elevations
        of 2900-3200 m.Berberis aemulans is a deciduous shrub up to 2 m tall,
        with spines along the branches. Leaves are simple, elliptical to ovate,
        up to 4 cm long, lighter in color on the underside because of a waxy
        layer. Flowers are in simple racemes of only a few flowers. Berries
        egg-shaped, orange, up to 16 mm long.
  - source_sentence: >-
      Instruct: Given a web search query, retrieve relevant passages that answer
      the query.

      Query: Parodi's hemispingus
    sentences:
      - >-
        Another event dubbed a "Battle of the Sexes" took place during the 1998
        Australian Open[51] between Karsten Braasch and the Williams sisters.
        Venus and Serena Williams had claimed that they could beat any male
        player ranked outside the world's top 200, so Braasch, then ranked
        203rd, challenged them both. Braasch was described by one journalist as
        "a man whose training regime centered around a pack of cigarettes and
        more than a couple bottles of ice cold lager".[52][51] The matches took
        place on court number 12 in Melbourne Park,[53] after Braasch had
        finished a round of golf and two shandies. He first took on Serena and
        after leading 5–0, beat her 6–1. Venus then walked on court and again
        Braasch was victorious, this time winning 6–2.[54] Braasch said
        afterwards, "500 and above, no chance". He added that he had played like
        someone ranked 600th in order to keep the game "fun".[55] Braasch said
        the big difference was that men can chase down shots much easier, and
        that men put spin on the ball that the women can't handle. The Williams
        sisters adjusted their claim to beating men outside the top 350.[51]
      - >-
        The Parodi 's hemispingus ( Hemispingus parodii ) is a species of bird
        in the family Thraupidae that is endemic to Peru .   Its natural habitat
        is subtropical or tropical moist montane forests .
      - >-
        I need help because my Minecraft launcher doesn't work... It's been a
        long time I haven't played Minecraft and until now it worked nicely. But
        now that I want to play on it again and I run the launcher, this appears
        (click images to enlarge): ![enter image description
        here](http://i.stack.imgur.com/hvD9R.png) At the bottom left of the
        screen the profile names keep loading (normally my username appears in
        the box) and as you can see I am unable to click on the "Play" button. I
        tried creating another profile but it doesn't work because soon after
        they ask to enter my Minecraft username and password. The password I
        entered disappears and it keeps loading (I've tried waiting like, 30
        minutes and it still doesn't work) so this is definitely not normal.
        ![enter image description here](http://i.stack.imgur.com/yDYjX.png)
        ![enter image description here](http://i.stack.imgur.com/4Nf1L.png)
        ![enter image description here](http://i.stack.imgur.com/T6cJu.png) So
        basically I can't play on Minecraft anymore (version 1.7.9)... P.S. I
        use Windows 7.
  - source_sentence: >-
      Instruct: Given a web search query, retrieve relevant passages that answer
      the query.

      Query: Mahabharata
    sentences:
      - >-
        The epic employs the story within a story structure, otherwise known as
        frametales, popular in many Indian religious and non-religious works. It
        is first recited at Takshashila by the sage Vaiśampāyana,[12][13] a
        disciple of Vyāsa, to the King Janamejaya who is the great-grandson of
        the Pāṇḍava prince Arjuna. The story is then recited again by a
        professional storyteller named Ugraśrava Sauti, many years later, to an
        assemblage of sages performing the 12-year sacrifice for the king
        Saunaka Kulapati in the Naimiśa Forest.
      - >-
        Guncati (Serbian Cyrillic: Гунцати) is a suburban settlement of
        Belgrade, the capital of Serbia. It is located in the municipality of
        Barajevo.Guncati is located west of the municipal seat of Barajevo,
        halfway between the Belgrade-Bar railway and Ibarska magistrala (Highway
        of Ibar).It is a rural settlement with a steady population growth: from
        1,718 (Census 1991) to 2,102 (Census 2002).
      - >-
        Beck 's Brewery , also known as Brauerei Beck & Co. , is a brewery in
        the northern German city of Bremen . In 2001 , Interbrew agreed to buy
        Brauerei Beck for 1.8 billion euro ; at that time it was the fourth
        largest brewer in Germany . US manufacture of Beck 's Brew has been
        based in St. Louis , Missouri , since early 2012 but some customers have
        rebelled against the US market version .   Since 2008 , it has been
        owned by the Interbrew subsidiary of Anheuser-Busch InBev SA/NV .   The
        Beck 's Art Label Campaign has offered artists the opportunity to
        provide designs to replace the brand 's label . It started in London in
        1987 with Gilbert and George . The artists created an art label ,
        because Beck 's sponsored their retrospective at the Hayward Gallery .
        The labels of the 2000 limited edition Beck 's bottles were matching
        their exhibition poster . Other participants of the Art Label Campaign
        are members of the loose group `` Young British Artists '' and nominees
        or winners of the Turner Prize . Damien Hirst for example , designed a
        label for Beck 's in 1995 , showing his famous spots . In 2000 , Tracey
        Emin created a label , which shows herself , posing in a bathtub .
        Furthermore , Rachel Whiteread designed a label in 1993 , presenting her
        artwork `` house '' , which was also financed by Beck 's . The Art Label
        Campaign has also been parodied by Matthew Higgs , who is a member of
        the British art collective `` Bank '' . In the Bank exhibition `` The
        Charge of the Light Brigade '' in 1995 , he brewed a beer , called ``
        Kunstlerbrau '' . In 2012 , Beck 's started giving young and independent
        musicians the opportunity to design a label for the Beck 's bottle .
        Beck 's summer 2009 limited-edition labels were designed by the musical
        groups Hard-Fi and Ladyhawke .
  - source_sentence: >-
      Instruct: Given a web search query, retrieve relevant passages that answer
      the query.

      Query: Ahu A Umi Heiau
    sentences:
      - >-
        The 1967 All-Ireland Intermediate Hurling Championship was the seventh
        staging of the All-Ireland hurling championship. The championship ended
        on 17 September 1967.Tipperary were the defending champions, however,
        they were defeated in the provincial championship. London won the title
        after defeating Cork by 1-9 to 1-5 in the final.
      - >-
        The digit ratio is the ratio of the lengths of different digits or
        fingers typically measured from the midpoint of bottom crease ( where
        the finger joins the hand ) to the tip of the finger . It has been
        suggested by some scientists that the ratio of two digits in particular
        , the 2nd ( index finger ) and 4th ( ring finger ) , is affected by
        exposure to androgens , e.g. , testosterone while in the uterus and that
        this 2D :4 D ratio can be considered a crude measure for prenatal
        androgen exposure , with lower 2D :4 D ratios pointing to higher
        prenatal androgen exposure . The 2D :4 D ratio is calculated by dividing
        the length of the index finger of a given hand by the length of the ring
        finger of the same hand . A longer index finger will result in a ratio
        higher than 1 , while a longer ring finger will result in a ratio lower
        than 1 .   The 2D :4 D digit ratio is sexually dimorphic : although the
        second digit is typically shorter in both females and males , the
        difference between the lengths of the two digits is greater in males
        than in females .   A number of studies have shown a correlation between
        the 2D :4 D digit ratio and various physical and behavioral traits .
      - >-
        Ahu A ʻ Umi Heiau means "shrine at the temple of ʻ Umi" in the Hawaiian
        Language.

SentenceTransformer based on dunzhang/stella_en_1.5B_v5

This is a sentence-transformers model finetuned from dunzhang/stella_en_1.5B_v5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: dunzhang/stella_en_1.5B_v5
  • Maximum Sequence Length: 8096 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8096, 'do_lower_case': False}) with Transformer model: Qwen2Model 
  (1): Pooling({'word_embedding_dimension': 1536, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 1536, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: Ahu A Umi Heiau',
    'Ahu A ʻ Umi Heiau means "shrine at the temple of ʻ Umi" in the Hawaiian Language.',
    'The digit ratio is the ratio of the lengths of different digits or fingers typically measured from the midpoint of bottom crease ( where the finger joins the hand ) to the tip of the finger . It has been suggested by some scientists that the ratio of two digits in particular , the 2nd ( index finger ) and 4th ( ring finger ) , is affected by exposure to androgens , e.g. , testosterone while in the uterus and that this 2D :4 D ratio can be considered a crude measure for prenatal androgen exposure , with lower 2D :4 D ratios pointing to higher prenatal androgen exposure . The 2D :4 D ratio is calculated by dividing the length of the index finger of a given hand by the length of the ring finger of the same hand . A longer index finger will result in a ratio higher than 1 , while a longer ring finger will result in a ratio lower than 1 .   The 2D :4 D digit ratio is sexually dimorphic : although the second digit is typically shorter in both females and males , the difference between the lengths of the two digits is greater in males than in females .   A number of studies have shown a correlation between the 2D :4 D digit ratio and various physical and behavioral traits .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Logs

Epoch Step Training Loss retrival loss
0.6466 500 0.0424 0.0060
1.2932 1000 0.0073 0.0040