ccore
/

Llama2-330m-32k-Rhetorical-Agents-QA-Builder

Model card Files Files and versions Community

ccore commited on Sep 8, 2023

Commit

8ede19a

•

1 Parent(s): cc72aac

Update README.md

Browse files

Files changed (1) hide show

README.md +49 -119

README.md CHANGED Viewed

@@ -1,121 +1,53 @@
 # QA Builder - 32k sequence length
-I'm Happy to share the training progress of my new language model with a 32k setup. Training is being meticulously delivered in six distinct steps to ensure broad coverage and effectiveness. It is important to mention that all content used was converted to Markdown format, ensuring uniformity and optimization of processing. Here are the steps:
-1- Wikipedia Titles+Introduction: The solid foundation of the model is built using Wikipedia titles, offering an overview of numerous topics.
-2-Titles + Wikipedia Content: The next step deepens the model's understanding by incorporating not only titles, but also the full content of Wikipedia articles. Everthing was ordered, so the model never learns something without having knowledge of what it has dependent on.
-3-Classic Books: To ensure a deep and historical literary understanding, the model is exposed to the texts of classic books, immersing him in the nuances of literary language.
-4-Articles: This step infuses the model with up-to-date and detailed information on various topics, taking advantage of the rich content of articles from different fields of knowledge.
-5-QA (Questions and Answers): To enhance the model's responsiveness and understanding capabilities, it is trained on a question and answer dataset.
-6-Rhetoric: Finally, to fine-tune the model's ability to understand and generate persuasive speeches, he is exposed to materials rich in rhetorical elements.
-7 look forward to sharing the results of this exciting project with you all!
-# **SAMPLE**
-This was a 330 million model that still has a slightly high loss:
-# RESPONSE
-    In this article, we present the online Viterbi algorithm to decode
-Markov models (HMMs) in much smaller than linear space. Our analysis on
-Two-state HMMs suggest that the maximum expected memory used to decode
-sequence of length $n$ with state $m$ HMM can be as low as $\Theta(m\log n)$,
-without a significant slowdown compared to the classic Viterbi algorithm.
-The classical Viterbi algorithm requires $O(mn)$ space, which is impractical for
-analysis of long sequences of DNA (such as complete chromosomes of the human genome) and
-for continuous data streams. We also experimentally demonstrate the performance
-of Viterbi's online algorithm into a simple HMM for gene localization in both
-simulated and real DNA sequences.
-# PROMPT
-   \begin{align*}
-   Viterbi, $A\cdot\epsilon^2 = 0.2\pi|A^2$$
-   \end{align*}
-   The above code takes two vectors $A$ and $B$, respectively, as a potential padding for $u(n)$ units during sequential fetch. The resulting value for the next unit $u(n)$ is a probability distribution function defined by:
-   \begin{align*}
-   \frac{P(X)}{B\times Viterbi}{F^3}=\sum_{k=N-1} P(X)=0\log_{k=L-1} P(X) \cdot V([np,np-np_np,np_np_np,np_np_np_np])
-   \end{align*}
-   The code above provides an estimate of the overall loss function of a DNA sequence, which represents the number of classes that need to be extracted from the HMM. The code consists of five convolutional layers with convolutional layers with dense layers that have a sigmin gradient corresponding to the corresponding output.
-   What does this mean for the K-DNA replication process?
-# ENDOFTEXT
-I created samples of size 32k that enabled:
-## **RHETORIC SAMPLE**
-Its mission is to develop the learning of language models, for that you need to create induce with instructions, so that a model, when reading an article, can write the rhetorical question that motivated the published idea. That is, I need you to create the instructions, to motivate the models to understand what was the rhetorical questioning that led to the discovery presented in the paper.
-Create a task knowing that:
-1- the model will receive a random article from a scientific discovery
-2- he needs to understand what was the question that the author asked himself to arrive at the discovery.
-3- the question written to the model will be presented to another model that must answer the question, in this way we will create a rhetoric.
-# INSTRUCTIONS:
-1- You are working as a language model that needs to develop the learning of language models. Your task is to induce rhetorical questions from scientific discoveries, and motivate the models to understand what was the questioning that led to those discoveries.
-2- To achieve this task, you will receive a random article from a scientific discovery. You need to read the article carefully and identify the rhetorical question that motivated the author to make that discovery. This question must be presented as the main idea of the article.
-3- Once you have identified the rhetorical question, you will be provided with additional information about the discovery in the form of a statement or hypothesis. Your task is to ask another model (preferably someone who has limited knowledge of science) whether this statement/hypothesis can be proven or refuted based on the available evidence at that time.
-4- After receiving the answer, you will present yourself as the author of the article and motivate the models by asking them a rhetorical question related to the topic covered in the article. This question should lead the model to deduce the rhetorical question that motivated the discovery.
-5- Finally, you will summarize the main idea of the article in one sentence and present it as a rhetorical question that can be further explored and expanded upon. You can use various language models to create this task, making sure to keep the integrity of the scientific discoveries intact.
-I hope these instructions help you create an engaging and informative task for the language models to develop their learning. Let me know if you need any more assistance on this matter.
-# [PAPER]
->  We study the two-particle wave function of paired atoms in a Fermi gas with
-tunable interaction strengths controlled by Feshbach resonance. The Cooper pair
-wave function is examined for its bosonic characters, which is quantified by
-the correction of Bose enhancement factor associated with the creation and
-annihilation composite particle operators. An example is given for a
-three-dimensional uniform gas. Two definitions of Cooper pair wave function are
-examined. One of which is chosen to reflect the off-diagonal long range order
-(ODLRO). Another one corresponds to a pair projection of a BCS state. On the
-side with negative scattering length, we found that paired atoms described by
-ODLRO are more bosonic than the pair projected definition. It is also found
-that at $(k_F a)^{-1} \ge 1$, both definitions give similar results, where more
-than 90% of the atoms occupy the corresponding molecular condensates.We study the two-particle wave function of paired atoms in a Fermi gas with
-tunable interaction strengths controlled by Feshbach resonance. The Cooper pair
-wave function is examined for its bosonic characters, which is quantified by
-the correction of Bose enhancement factor associated with the creation and
-annihilation composite particle operators. An example is given for a
-three-dimensional uniform gas. Two definitions of Cooper pair wave function are
-examined. One of which is chosen to reflect the off-diagonal long range order
-(ODLRO). Another one corresponds to a pair projection of a BCS state. On the
-side with negative scattering length, we found that paired atoms described by
-ODLRO are more bosonic than the pair projected definition. It is also found
-that at $(k_F a)^{-1} \ge 1$, both definitions give similar results, where more
-than 90% of the atoms occupy the corresponding molecular condensates.
-# [RESPONSE]
-Based on the article, we can deduce that the main idea is to investigate the behavior of paired atoms in a Fermi gas with tunable interaction strengths controlled by Feshbach resonance under various conditions. The authors aim to study the two-particle wave function of paired atoms and examine its bosonic characters through the correction of Bose enhancement factor associated with the creation and annihilation composite particle operators.
-To motivate the models, we could ask another language model whether they can think of any other potential benefits or applications related to this topic. For example, how might this research contribute to our understanding of other phenomena in physics or chemistry? Or
-, what are some practical implications or challenges associated with controlling the interaction strengths of paired atoms in a Fermi gas?
-In terms of rhetorical questions, we could ask model: "Can you imagine what it would be like to have complete control over the behavior of particles at the atomic level?" Or, "Do you think studying the behavior of paired atoms in a Fermi gas with tunable interaction strengths controlled by Feshbach resonance has any potential implications for our understanding of other aspects of fundamental physics or chemistry?".
-# [RESPONSE]
-While it is difficult to predict exactly how this research may contribute to our understanding of other phenomena in physics or chemistry, some possible directions include:
-- Understanding the behavior of paired atoms in a Fermi gas with tunable interaction strengths controlled by Feshbach resonance could provide insight into fundamental questions related to superfluidity and high temperature superconductivity, which are important areas of research in condensed matter physics.
-- The development of control over the behavior of paired atoms in a Fermi gas with tunable interaction strengths could have practical implications for applications such as quantum computing or precision spectroscopy, where ultra-high sensitivity and resolution are required.
-- Additionally, this research could also challenge our understanding of the nature of superfluidity, which has been debated in the physics community for decades, and may lead to new discoveries and theories about how paired atoms behave when they are in a highly ordered state with tunable interaction strengths controlled by Feshbach resonance.
-# [SUMMARY]
-Overall, this research study examines the behavior of paired atoms in a Fermi gas with tunable interaction strengths controlled by Feshbach resonance, and provides insight into fundamental questions related to superfluidity and high temperature superconductivity. It also has practical implications for applications such as quantum computing or precision spectroscopy, where ultra-high sensitivity and resolution are required. Additionally, this research could challenge our understanding of the nature of superfluidity and may lead to new discoveries and theories about how paired atoms behave when they are in a highly ordered state with tunable interaction strengths controlled by Feshbach resonance. [end of text]
-## STATUS TRAINING  -
-in my last tests with length 2048, I got great models, I trained models in 24 hours with only a 4090 GPU, I'll try to do the same with this 32k, in the following hours and I'll post the result
-In training, step 2/6
-Each stage lasts 4-6 hours.
-I am releasing the partial models, in the end I will also release the datasets. 100% synthetic data in markdown
-1 - 2.5h OK  result :
 (if you have problems on eval, set same max_length)
 |   Task   |Version|Metric|Value |   |Stderr|
 |----------|------:|------|-----:|---|-----:|
@@ -137,8 +69,6 @@ hf-causal (max_length=1280), limit: None, provide_description: False, num_fewsho
 |hellaswag    |      0|acc     |0.2892|±  |0.0045|
 |             |       |acc_norm|0.3114|±  |0.0046|
-2 - RUNNING - next upload 9/9 - 00:30 GMT
-3 -

+---
+license: other
+datasets:
+- wikipedia
+pipeline_tag: text-generation
+tags:
+- llama2
+- prompt
+- reverse prompt
+- qa
+- questiona
+- answer
+- builder
+- prompt writer
+- 32k
+- long context
+- real long context
+- rhetoric
+- agents
+- markdown
+- from scratch
+---
 # QA Builder - 32k sequence length
+Welcome to the training progress of my new language model with a 32k configuration. Training a model is a meticulous process, and it is in this configuration that the size of the sequence stands out in a crucial way, allowing greater ability to understand and generate text.
+## **Why 32k?**
+Sequence size is critical. A longer sequence allows the model to understand more extensive context, capturing nuances and details that might otherwise be missed. Additionally, with the expansion to 32k, we have more space to incorporate sophisticated elements such as rhetoric, allowing for more persuasive and eloquent expression.
+Here are the painstaking steps through which the model is being trained:
+1. **Wikipedia Titles+Introduction**: A solid foundation is built using Wikipedia titles, providing an overview of various topics.
+2. **Titles + Wikipedia Content**: Deepening understanding, the full content of Wikipedia articles is incorporated.
+3. **Classic Books**: An immersion in the nuances of historical literary language, training the model with texts from classic books.
+4. **Articles**: Incorporating detailed and updated information from articles from different fields of knowledge.
+5. **QA (Questions and Answers)**: Improving model responsiveness and understandability with a dataset of questions and answers.
+6. **Rhetoric**: Rhetoric plays a vital role in refining the model's ability to understand and generate persuasive speeches. For this, he is exposed to materials rich in rhetorical elements.
+I look forward to sharing the results of this fascinating project with you all!
+## **Training Status**
+In my last tests with the sequence of length 2048, I achieved great models. With just a 4090 GPU, I trained models in 24 hours. I'll try to replicate the success with this 32k setup over the next few hours and post the result.
+I am currently on step 2/6 of training. Each stage lasts 4 to 6 hours. I'm releasing the partial models, and at the end, I will also release the datasets, all 100% synthetic and formatted in Markdown.
+**Results so far:** [Results shown show model-specific metrics]
 (if you have problems on eval, set same max_length)
 |   Task   |Version|Metric|Value |   |Stderr|
 |----------|------:|------|-----:|---|-----:|
 |hellaswag    |      0|acc     |0.2892|±  |0.0045|
 |             |       |acc_norm|0.3114|±  |0.0046|
+**Next update:** 9/9 - 02:30 GMT
+Your contribution and feedback are always valuable. Follow along and share your thoughts as we move forward on this exciting journey!