--- 
language: 
  - en
datasets:
  - natural_instructions
  - the_pile
  - cot
  - Muennighoff/P3
tags:
  - gpt
pipeline_tag: text-generation
inference:
  parameters:
    temperature: 1.0
    top_k: 1
    max_new_tokens: 5
widget:
- text: "Label the sentence based on whether it is related to an adverse drug effect (ADE). Details are described below:\nDrugs: Names of drugs and chemicals that include brand names, trivial names, abbreviations and systematic names were annotated. Mentions of drugs or chemicals should strictly be in a therapeutic context. This category does not include the names of metabolites, reaction byproducts, or hospital chemicals (e.g. surgical equipment disinfectants).\nAdverse effect: Mentions of adverse effects include signs, symptoms, diseases, disorders, acquired abnormalities, deficiencies, organ damage or death that strictly occur as a consequence of drug intake.\nPossible labels:\n1. ADE-related\n2. not ADE-related\n\nSentence: A challenge with clozapine was feasible and showed no clinical symptoms of eosinophilia.\nLabel: not ADE-related\n\nSentence: CONCLUSIONS: These results suggest that clozapine may cause TD; however, the prevalence is low and the severity is relatively mild, with no or mild self-reported discomfort.\nLabel: ADE-related\n\nSentence: Best-corrected visual acuity measurements were performed at every visit.\nLabel: not ADE-related\n\nSentence: These cases were considered unusual in light of the short delay of their onset after initiation of immunosuppressive therapy and their fulminant course: 3 of these patients died of PCP occurring during the first month of treatment with prednisone.\nLabel: ADE-related\n\nSentence: The INR should be monitored more frequently when bosentan is initiated, adjusted, or discontinued in patients taking warfarin.\nLabel: not ADE-related\n\nSentence: NEH must be considered in lupus patients receiving cytotoxic agents to avoid inappropriate use of corticosteroids or antibiotics in this self-limited condition.\nLabel:"
  example_title: "ADE Corpus V2"
- text: "The following is a banking customer service query. Classify the query into one of the 77 categories available.\nPossible labels:\n1. Refund_not_showing_up\n2. activate_my_card\n3. age_limit\n4. apple_pay_or_google_pay\n5. atm_support\n6. automatic_top_up\n7. balance_not_updated_after_bank_transfer\n8. balance_not_updated_after_cheque_or_cash_deposit\n9. beneficiary_not_allowed\n10. cancel_transfer\n11. card_about_to_expire\n12. card_acceptance\n13. card_arrival\n14. card_delivery_estimate\n15. card_linking\n16. card_not_working\n17. card_payment_fee_charged\n18. card_payment_not_recognised\n19. card_payment_wrong_exchange_rate\n20. card_swallowed\n21. cash_withdrawal_charge\n22. cash_withdrawal_not_recognised\n23. change_pin\n24. compromised_card\n25. contactless_not_working\n26. country_support\n27. declined_card_payment\n28. declined_cash_withdrawal\n29. declined_transfer\n30. direct_debit_payment_not_recognised\n31. disposable_card_limits\n32. edit_personal_details\n33. exchange_charge\n34. exchange_rate\n35. exchange_via_app\n36. extra_charge_on_statement\n37. failed_transfer\n38. fiat_currency_support\n39. get_disposable_virtual_card\n40. get_physical_card\n41. getting_spare_card\n42. getting_virtual_card\n43. lost_or_stolen_card\n44. lost_or_stolen_phone\n45. order_physical_card\n46. passcode_forgotten\n47. pending_card_payment\n48. pending_cash_withdrawal\n49. pending_top_up\n50. pending_transfer\n51. pin_blocked\n52. receiving_money\n53. request_refund\n54. reverted_card_payment?\n55. supported_cards_and_currencies\n56. terminate_account\n57. top_up_by_bank_transfer_charge\n58. top_up_by_card_charge\n59. top_up_by_cash_or_cheque\n60. top_up_failed\n61. top_up_limits\n62. top_up_reverted\n63. topping_up_by_card\n64. transaction_charged_twice\n65. transfer_fee_charged\n66. transfer_into_account\n67. transfer_not_received_by_recipient\n68. transfer_timing\n69. unable_to_verify_identity\n70. verify_my_identity\n71. verify_source_of_funds\n72. verify_top_up\n73. virtual_card_not_working\n74. visa_or_mastercard\n75. why_verify_identity\n76. wrong_amount_of_cash_received\n77. wrong_exchange_rate_for_cash_withdrawal\n\nQuery: My card payment was not successful.\nLabel: declined_card_payment\n\nQuery: Is it possible for me to change my PIN number?\nLabel: change_pin\n\nQuery: limits on top ups\nLabel: top_up_limits\n\nQuery: I live in the EU - can I get a card?\nLabel: country_support\n\nQuery: How can I tell the source for my available funds?\nLabel: verify_source_of_funds\n\nQuery: Why am I getting declines when trying to make a purchase online?\nLabel:"
  example_title: "Banking77"
- text: "In law, an overruling sentence is a statement that nullifies a previous case decision as a precedent, by a constitutionally valid statute or a decision by the same or higher ranking court which establishes a different rule on the point of law involved. Label the sentence based on whether it is overruling or not.\nPossible labels:\n1. not overruling\n2. overruling\n\nSentence: see  mciver, 134 n.c.app. at 588, 518 s.e.2d at 526.\nLabel: not overruling\n\nSentence: to the extent that paprskar v. state, supra, applied the general test of waiver of constitutional rights set forth in johnson v. zerbst, supra, it is no longer viable.\nLabel: overruling\n\nSentence:  narrowstep, 2010 wl 5422405, at *12.\nLabel: not overruling\n\nSentence: accordingly, to the extent of any conflict nemecek v. state, 621 s.w.2d 404 (tex.cr.app. 1980) is overruled.\nLabel: overruling\n\nSentence: the following facts are taken from the administrative record.\nLabel: not overruling\n\nSentence: see scott, supra at 352; commonwealth v. ruffin, 475 mass. 1003, 1004 (2016).\nLabel:"
  example_title: "Overruling"
- text: "Label whether the following tweet contains hate speech against either immigrants or women. Hate Speech (HS) is commonly defined as any communication that disparages a person or a group on the basis of some characteristic such as race, color, ethnicity, gender, sexual orientation, nationality, religion, or other characteristics.\nPossible labels:\n1. hate speech\n2. not hate speech\n\nTweet: #TakeAKnee trump\'s a chump! @user is the real deal-helping so many on his journey through greatness! Even @user can\'t stand #IQ45-stands to CLEAN UP while he\'s in prison. One hot, lucky #Immigrant babe! #OHi12 #VoteBlue @user in the #Columbus area. #BlueWave baby!\nLabel: not hate speech\n\nTweet: HOW REFRESHING! In South Korea, there is no such thing as \'political correctness\" when it comes to dealing with Muslim refugee wannabes via @user\nLabel: hate speech\n\nTweet: New to Twitter-- any men on here know what the process is to get #verified?\nLabel: not hate speech\n\nTweet: UK Pensioner Faces 350 Lashes In Saudi Arabia why does this country exist it does nothing for migrants picks on old men no help from anyone\nLabel: not hate speech\n\nTweet: RT @user Her:I don\'t get what u want outta this relationship Him:Well, I was only looking for a bj but u kept coming back\nLabel: not hate speech\n\nTweet: Dont worry @user you are and will always be the most hysterical woman.\nLabel:"
  example_title: "Tweet Eval Hate"
---

<h1 style="font-size: 42px">TOGETHER RESEARCH<h1/>

# Model Summary
We present GPT-JT, a fork of GPT-6B, trained for 20,000 steps, that outperforms most 100B+ parameter models at classification, and improves most tasks. GPT-JT was trained with a new decentralized algorithm with  1G interconnect.

# Quick Start
```python
from transformers import pipeline
pipe = pipeline(model='togethercomputer/GPT-JT-6B-v1')
pipe('''Please answer the following question:\n\nQuestion: Where is Zurich?\nAnswer:''')
```
# Training Data
We fine-tune [GPT-J-6B](https://huggingface.co/EleutherAI/gpt-j-6B) on NI, P3, COT, the pile data.
- [Natural-Instructions](https://github.com/allenai/natural-instructions)
- [P3](https://huggingface.co/datasets/Muennighoff/P3)
- [MMLU-COT](https://github.com/jasonwei20/flan-2/blob/main/mmlu-cot.json)
- [the pile](https://huggingface.co/datasets/the_pile)

# Hyperparameters
We used AdamW with a learning rate of 1e-5 and global batch size of 64, and train for 20k steps.
We used mix-precision training where the activation is in FP16 while the optimizer states are kept in FP32.
We use both data parallelism and pipeline parallelism to conduct training.
During training, we truncate the input sequence to 2048 tokens, and for input sequence that contains less than 2048 tokens, we concatenate multiple sequences into one long sequence to improve the data efficiency.

# Infrastructure
We used [the Together Research Computer](https://together.xyz/) to conduct training.