AI & ML interests

AGI and ML Pipelines, Ambient IoT AI, Behavior Cognitive and Memory AI, Clinical Medical and Nursing AI, Genomics AI, GAN Gaming GAIL AR VR XR and Simulation AI, Graph Ontology KR KE AI, Languages and NLP AI, Quantum Compute GPU TPU NPU AI, Vision Image Document and Audio/Video AI

Organization Card
About org cards

Classroom Examples for Today:

HF Features to Check Out First - Boost your Speed:

  1. HF_TOKEN create - Why? Hit quota on free usage and see errors - Solve w this. Also this lets spaces read/write as you.
  2. Model Easy Button with Gradio
  3. https://huggingface.co/spaces/awacke1/Model-Easy-Button1-ZeroShotImageClassifier-Openai-clip-vit-large-patch14
  4. https://huggingface.co/spaces/awacke1/Easy-Button-Zero-Shot-Text-Classifier-facebook-bart-large-mnli
  5. https://huggingface.co/spaces/awacke1/Model-Easy-Button-Generative-Images-runwayml-stable-diffusion-v1-5
  6. https://huggingface.co/spaces/awacke1/Model-Easy-Button-Generative-Text-bigscience-bloom
  7. Check out API Link at Bottom - Gradio auto generates API for you along with usage.
  8. Spaces Embed Button
    1. Bring all four together now into a dashboard!
  9. Space Duplicate Button

Examples 03_16_2023:

  1. HTML5 - Build AI Dashboards with HTML5 Spaces. Spaces Context Menu. Mediapipe. https://huggingface.co/spaces/awacke1/AI.Dashboard.HEDIS.Terminology.Vocabulary.Codes
  2. ChatGPT - Demonstrate three modes including GPT-4 which started this week. https://chat.openai.com/chat
  3. Wikipedia Crowdsource Human Feedback (HF) and Headless URL: https://awacke1-streamlitwikipediachat.hf.space https://huggingface.co/spaces/awacke1/StreamlitWikipediaChat
  4. Cognitive Memory - AI Human Feedback (HF), Wikichat, Tweet Sentiment Dash: https://huggingface.co/spaces/awacke1/AI.Dashboard.Wiki.Chat.Cognitive.HTML5
  5. Twitter Sentiment Graph Example: https://awacke1-twitter-sentiment-live-realtime.hf.space/ Modify to split URL w ChatGPT?
  6. ASR Comparitive Review:
    1. Multilingual Models: jonatasgrosman/wav2vec2-large-xlsr-53-english Space: https://huggingface.co/spaces/awacke1/ASR-High-Accuracy-Test
    2. Speech to Text and Back to Speech in Voice Models: https://huggingface.co/spaces/awacke1/TTS-STT-Blocks Model: https://huggingface.co/facebook/wav2vec2-base-960h
    3. Gradio Live Mode: https://huggingface.co/spaces/awacke1/2-LiveASR Models: facebook/blenderbot-400M-distill nvidia/stt_en_conformer_transducer_xlarge
  7. Bloom Example:
    1. Step By Step w Bloom: https://huggingface.co/spaces/EuroPython2022/Step-By-Step-With-Bloom
  8. ChatGPT with Key Example: https://huggingface.co/spaces/awacke1/chatgpt-demo
    1. Get or revoke your keys here: https://platform.openai.com/account/api-keys
    2. Example fake: tsk-H2W4lEeT4Aonxe2tQnUzT3BlbkFJq1cMwMANfYc0ftXwrJSo12345t

Components for Dash - Demo button to Embed Space to get IFRAME code:

https://huggingface.co/spaces/awacke1/Health.Assessments.Summarizer HEDIS Dash:

  1. HEDIS Related Dashboard with CT: https://huggingface.co/spaces/awacke1/AI.Dashboard.HEDIS

πŸ‘‹ Two easy ways to turbo boost your AI learning journey! πŸ’»

🌐 AI Pair Programming

Open 2 Browsers to:

  1. 🌐 ChatGPT URL or URL2 and
  2. 🌐 Huggingface URL in separate browser windows.

πŸŽ₯ YouTube University Method:

πŸŽ₯ 2023 AI/ML Advanced Learning Playlists:

  1. 2023 Streamlit Pro Tips for AI UI UX for Data Science, Engineering, and Mathematics
  2. 2023 Fun, New and Interesting AI, Videos, and AI/ML Techniques
  3. 2023 Best Minds in AGI AI Gamification and Large Language Models
  4. 2023 State of the Art for Vision Image Classification, Text Classification and Regression, Extractive Question Answering and Tabular Classification
  5. 2023 QA Models and Long Form Question Answering NLP

Cloud Patterns - Dataset Architecture Patterns for Cloud Optimal Datasets:

  1. Azure Blob/DataLake adlfs: https://huggingface.co/docs/datasets/filesystems
  2. AWS: Amazon S3 s3fs: https://s3fs.readthedocs.io/en/latest/
  3. Google Cloud Storage gcsfs: https://gcsfs.readthedocs.io/en/latest/
  4. Google Drive: Google Drive gdrivefs: https://github.com/intake/gdrivefs

Apache BEAM: https://huggingface.co/docs/datasets/beam Datasets: https://huggingface.co/docs/datasets/index

Datasets Spaces - High Performance Cloud Dataset Patterns

  1. Health Care AI Datasets: https://huggingface.co/spaces/awacke1/Health-Care-AI-and-Datasets
  2. Dataset Analyzer: https://huggingface.co/spaces/awacke1/DatasetAnalyzer
  3. Shared Memory with Github LFS: https://huggingface.co/spaces/awacke1/Memory-Shared
  4. CSV Dataset Analyzer: https://huggingface.co/spaces/awacke1/CSVDatasetAnalyzer
  5. Pandas Profiler Report for EDA Datasets: https://huggingface.co/spaces/awacke1/WikipediaProfilerTestforDatasets
  6. Datasets High Performance IMDB Patterns for AI: https://huggingface.co/spaces/awacke1/SaveAndReloadDataset

ChatGPT Prompts Datasets

  1. https://huggingface.co/datasets/fka/awesome-chatgpt-prompts
  2. https://github.com/f/awesome-chatgpt-prompts
  3. Example with role based behavior: I want you to act as a stand-up comedian. I will provide you with some topics related to current events and you will use your wit, creativity, and observational skills to create a routine based on those topics. You should also be sure to incorporate personal anecdotes or experiences into the routine in order to make it more relatable and engaging for the audience. My first request is "I want a humorous story and jokes to talk about the funny things about AI development and executive presentation videos"

Language Models πŸ—£οΈ

πŸ† Bloom sets new record for most performant and efficient AI model in science! 🌸

Comparison of Large Language Models

Model Name Model Size (in Parameters)
BigScience-tr11-176B 176 billion
GPT-3 175 billion
OpenAI's DALL-E 2.0 500 million
NVIDIA's Megatron 8.3 billion
Transformer-XL 250 million
XLNet 210 million

ChatGPT Datasets πŸ“š

  • WebText
  • Common Crawl
  • BooksCorpus
  • English Wikipedia
  • Toronto Books Corpus
  • OpenWebText

ChatGPT Datasets - Details πŸ“š

Big Science Model πŸš€

Datasets:

    • Universal Dependencies: A collection of annotated corpora for natural language processing in a range of languages, with a focus on dependency parsing.
    • WMT 2014: The fourth edition of the Workshop on Statistical Machine Translation, featuring shared tasks on translating between English and various other languages.
    • The Pile: An English language corpus of diverse text, sourced from various places on the internet.
    • HumanEval: A dataset of English sentences, annotated with human judgments on a range of linguistic qualities.
    • FLORES-101: A dataset of parallel sentences in 101 languages, designed for multilingual machine translation.
    • CrowS-Pairs: A dataset of sentence pairs, designed for evaluating the plausibility of generated text.
    • WikiLingua: A dataset of parallel sentences in 75 languages, sourced from Wikipedia.
    • MTEB: A dataset of English sentences, annotated with their entailment relationships with respect to other sentences.
    • xP3: A dataset of English sentences, annotated with their paraphrase relationships with respect to other sentences.
    • DiaBLa: A dataset of English dialogue, annotated with dialogue acts.

Deep RL ML Strategy 🧠

The AI strategies are:

  • Language Model Preparation using Human Augmented with Supervised Fine Tuning πŸ€–
  • Reward Model Training with Prompts Dataset Multi-Model Generate Data to Rank 🎁
  • Fine Tuning with Reinforcement Reward and Distance Distribution Regret Score 🎯
  • Proximal Policy Optimization Fine Tuning 🀝
  • Variations - Preference Model Pretraining πŸ€”
  • Use Ranking Datasets Sentiment - Thumbs Up/Down, Distribution πŸ“Š
  • Online Version Getting Feedback πŸ’¬
  • OpenAI - InstructGPT - Humans generate LM Training Text πŸ”
  • DeepMind - Advantage Actor Critic Sparrow, GopherCite 🦜
  • Reward Model Human Prefence Feedback πŸ†

For more information on specific techniques and implementations, check out the following resources:

  • OpenAI's paper on GPT-3 which details their Language Model Preparation approach
  • DeepMind's paper on SAC which describes the Advantage Actor Critic algorithm
  • OpenAI's paper on Reward Learning which explains their approach to training Reward Models
  • OpenAI's blog post on GPT-3's fine-tuning process

models

None public yet

datasets

None public yet