Wikipedia-Twitter-ChatGPT-Memory-Chat

Running

App Files Files Community

Wikipedia-Twitter-ChatGPT-Memory-Chat / README.md

awacke1

Update README.md

e147677 almost 2 years ago

preview code

raw

history blame contribute delete

1.88 kB

	---
	title: 🔍Wikipedia Twitter ChatGPT Memory Chat🏊
	emoji: 🌟GPT🔍
	colorFrom: green
	colorTo: yellow
	sdk: gradio
	sdk_version: 3.24.1
	app_file: app.py
	pinned: false
	license: mit
	---
	## ChatGPT Datasets 📚
	- WebText
	- Common Crawl
	- BooksCorpus
	- English Wikipedia
	- Toronto Books Corpus
	- OpenWebText
	## ChatGPT Datasets - Details 📚
	- WebText: A dataset of web pages crawled from domains on the Alexa top 5,000 list. This dataset was used to pretrain GPT-2.
	- [WebText: A Large-Scale Unsupervised Text Corpus by Radford et al.](https://paperswithcode.com/dataset/webtext)
	- Common Crawl: A dataset of web pages from a variety of domains, which is updated regularly. This dataset was used to pretrain GPT-3.
	- [Language Models are Few-Shot Learners](https://paperswithcode.com/dataset/common-crawl) by Brown et al.
	- BooksCorpus: A dataset of over 11,000 books from a variety of genres.
	- [Scalable Methods for 8 Billion Token Language Modeling](https://paperswithcode.com/dataset/bookcorpus) by Zhu et al.
	- English Wikipedia: A dump of the English-language Wikipedia as of 2018, with articles from 2001-2017.
	- [Improving Language Understanding by Generative Pre-Training](https://huggingface.co/spaces/awacke1/WikipediaUltimateAISearch?logs=build) Space for Wikipedia Search
	- Toronto Books Corpus: A dataset of over 7,000 books from a variety of genres, collected by the University of Toronto.
	- [Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond](https://paperswithcode.com/dataset/bookcorpus) by Schwenk and Douze.
	- OpenWebText: A dataset of web pages that were filtered to remove content that was likely to be low-quality or spammy. This dataset was used to pretrain GPT-3.
	- [Language Models are Few-Shot Learners](https://paperswithcode.com/dataset/openwebtext) by Brown et al.