metadata

license: apache-2.0
metrics:
  - accuracy
pipeline_tag: text-classification
tags:
  - LSGAttention
base_model: notdiamond/notdiamond-0001
inference: true
language:
  - en
  - hn
  - fr
  - it
  - de
  - es
widget:
  - text: >-
      Summarize the following Elo rating system Article From Wikipedia in 300
      words:   ''' the free encyclopedia Arpad Elo, the inventor of the Elo
      rating system The Elo[a] rating system is a method for calculating the
      relative skill levels of players in zero-sum games such as chess. It is
      named after its creator Arpad Elo, a Hungarian-American physics professor.
      The Elo system was invented as an improved chess-rating system over the
      previously used Harkness system,[1] but is also used as a rating system in
      association football, American football, baseball, basketball, pool, table
      tennis, various board games and esports, and more recently large language
      models. The difference in the ratings between two players serves as a
      predictor of the outcome of a match. Two players with equal ratings who
      play against each other are expected to score an equal number of wins. A
      player whose rating is 100 points greater than their opponent's is
      expected to score 64%; if the difference is 200 points, then the expected
      score for the stronger player is 76%.[2] A player's Elo rating is a number
      which may change depending on the outcome of rated games played. After
      every game, the winning player takes points from the losing one. The
      difference between the ratings of the winner and loser determines the
      total number of points gained or lost after a game. If the higher-rated
      player wins, then only a few rating points will be taken from the
      lower-rated player. However, if the lower-rated player scores an upset
      win, many rating points will be transferred. The lower-rated player will
      also gain a few points from the higher rated player in the event of a
      draw. This means that this rating system is self-correcting. Players whose
      ratings are too low or too high should, in the long run, do better or
      worse correspondingly than the rating system predicts and thus gain or
      lose rating points until the ratings reflect their true playing strength.
      Elo ratings are comparative only, and are valid only within the rating
      pool in which they were calculated, rather than being an absolute measure
      of a player's strength. While Elo-like systems are widely used in
      two-player settings, variations have also been applied to multiplayer
      competitions.[3] History Arpad Elo was a master-level chess player and an
      active participant in the United States Chess Federation (USCF) from its
      founding in 1939.[4] The USCF used a numerical ratings system, devised by
      Kenneth Harkness, to allow members to track their individual progress in
      terms other than tournament wins and losses. The Harkness system was
      reasonably fair, but in some circumstances gave rise to ratings which many
      observers considered inaccurate. On behalf of the USCF, Elo devised a new
      system with a more sound statistical basis.[5] At about the same time,
      György Karoly and Roger Cook independently developed a system based on the
      same principles for the New South Wales Chess Association.[6] Elo's system
      replaced earlier systems of competitive rewards with a system based on
      statistical estimation. Rating systems for many sports award points in
      accordance with subjective evaluations of the 'greatness' of certain
      achievements. For example, winning an important golf tournament might be
      worth an arbitrarily chosen five times as many points as winning a lesser
      tournament. A statistical endeavor, by contrast, uses a model that relates
      the game results to underlying variables representing the ability of each
      player. Elo's central assumption was that the chess performance of each
      player in each game is a normally distributed random variable. Although a
      player might perform significantly better or worse from one game to the
      next, Elo assumed that the mean value of the performances of any given
      player changes only slowly over time. Elo thought of a player's true skill
      as the mean of that player's performance random variable. A further
      assumption is necessary because chess performance in the above sense is
      still not measurable. One cannot look at a sequence of moves and derive a
      number to represent that player's skill. Performance can only be inferred
      from wins, draws and losses. Therefore, if a player wins a game, they are
      assumed to have performed at a higher level than their opponent for that
      game. Conversely, if the player loses, they are assumed to have performed
      at a lower level. If the game is a draw, the two players are assumed to
      have performed at nearly the same level. Elo did not specify exactly how
      close two performances ought to be to result in a draw as opposed to a win
      or loss. Actually, there is a probability of a draw that is dependent on
      the performance differential, so this latter is more of a confidence
      interval than any deterministic frontier. And while he thought it was
      likely that players might have different standard deviations to their
      performances, he made a simplifying assumption to the contrary. To
      simplify computation even further, Elo proposed a straightforward method
      of estimating the variables in his model (i.e., the true skill of each
      player). One could calculate relatively easily from tables how many games
      players would be expected to win based on comparisons of their ratings to
      those of their opponents. The ratings of a player who won more games than
      expected would be adjusted upward, while those of a player who won fewer
      than expected would be adjusted downward. Moreover, that adjustment was to
      be in linear proportion to the number of wins by which the player had
      exceeded or fallen short of their expected number.[7] From a modern
      perspective, Elo's simplifying assumptions are not necessary because
      computing power is inexpensive and widely available. Several people, most
      notably Mark Glickman, have proposed using more sophisticated statistical
      machinery to estimate the same variables. On the other hand, the
      computational simplicity of the Elo system has proven to be one of its
      greatest assets. With the aid of a pocket calculator, an informed chess
      competitor can calculate to within one point what their next officially
      published rating will be, which helps promote a perception that the
      ratings are fair. Implementing Elo's scheme The USCF implemented Elo's
      suggestions in 1960,[8] and the system quickly gained recognition as being
      both fairer and more accurate than the Harkness rating system. Elo's
      system was adopted by the World Chess Federation (FIDE) in 1970.[9] Elo
      described his work in detail in The Rating of Chessplayers, Past and
      Present, first published in 1978.[10] Subsequent statistical tests have
      suggested that chess performance is almost certainly not distributed as a
      normal distribution, as weaker players have greater winning chances than
      Elo's model predicts.[11][12] Often in paired comparison data, there’s
      very little practical difference in whether it is assumed that the
      differences in players’ strengths are normally or logistically
      distributed. Mathematically, however, the logistic function is more
      convenient to work with than the normal distribution.[13] FIDE continues
      to use the rating difference table as proposed by Elo.[14]: table 8.1b The
      development of the Percentage Expectancy Table (table 2.11) is described
      in more detail by Elo as follows:[15] The normal probabilities may be
      taken directly from the standard tables of the areas under the normal
      curve when the difference in rating is expressed as a z score. Since the
      standard deviation σ of individual performances is defined as 200 points,
      the standard deviation σ' of the differences in performances becomes σ√2
      or 282.84. The z value of a difference then is D/282.84. This will then
      divide the area under the curve into two parts, the larger giving P for
      the higher rated player and the smaller giving P for the lower rated
      player. For example, let D = 160. Then z = 160/282.84 = .566. The table
      gives .7143 and .2857 as the areas of the two portions under the curve.
      These probabilities are rounded to two figures in table 2.11.'''
  - text: write a python function that counts from 1 to 10?
  - text: If tan A = 3/4, prove that Sin A Cos A = 12/25. solve step by step.

notdiamond-4k-0001

notdiamond-4k-0001 supports 4096 input sequence length. This model is an extention of notdiamond-0001 which originally supported sequence length 512.
LSG atttention is used to adapt existing pre-trained model to efficiently extrapolate to 4046 sequence length with no additional training.

notdiamond-0001 automatically determines whether to send queries to GPT-3.5 or GPT-4, depending on which model is best-suited for your task. notdiamond-0001 was trained on hundreds of thousands of data points from robust, cross-domain evaluation benchmarks. The notdiamond-0001 router model is a classifier and will return a label for either GPT-3.5 or GPT-4.

Inference:

    from transformers import AutoTokenizer, AutoModelForSequenceClassification

    # input format
    query = "Can you write a function that counts from 1 to 10?"
    formatted_prompt = f"""Determine whether the following query should be sent to GPT-3.5 or GPT-4.
        Query:
        {query}"""

    tokenizer = AutoTokenizer.from_pretrained("notdiamond/notdiamond-0001")
    model = AutoModelForSequenceClassification.from_pretrained("notdiamond/notdiamond-0001")

    inputs = tokenizer(formatted_prompt,
      truncation=True, max_length=4096, return_tensors="pt")
    logits = model(**inputs).logits

    model_id = logits.argmax().item()
    id2label = {0: 'gpt-3.5', 1: 'gpt-4'}
    model_to_call = id2label[model_id]

You can also access their free API and the official website : documentation.