chansung park PRO

chansung

AI & ML interests

None yet

Articles

Organizations

chansung's activity

posted an update 1 day ago
view post
Post
1073
πŸ¦™πŸ¦™ LLaMA Duo project update

Last time, I gave a brief introduction about LLaMA Duo project with @sayakpaul . It is a simple toolset to aligning sLLM with service LLM with coverage dataset πŸ‘‰πŸ» (https://huggingface.co/posts/chansung/708646454991943).
- coverage dataset is what we believe to be the most important/desired (instruction, response) pairs. In system thinking, each instruction could be an analogy of a function from traditional programming. We make unit tests and measure the coverage % for all the features/functions. Similarly, we need to ensure if our fine-tuned model could handle what % of given instructions from coverage dataset satisfactory (hence coverage dataset).

We have tested it with "Coding" category of data from HuggingFaceH4/no_robots dataset. It has about 300 SFT training data points under Coding category. After fine-tuning Gemma 7B model on that, the result was very poor. LLaMA Duo's evaluation tool gave < 20% of metrics in similarity and preciseness on the test split.

So, we used LLaMA Duo's synthetic data generation tool to generate 60k data points that looks similar to the original dataset. We first created ~10k synthetic data points, then created 50k more based on the synthetic dataset itself.

After fine-tuning Gemma 7B on the 60k synthetic dataset, the evaluation result went up to 80~90% high. Also, when testing out the model in UI, it tends to give good responses.

It is a good showcase to transition from service LLM to sLLM or having a backup sLLM for service LLM failure scenarios. I am going to expand this experiments on all categories of no_robots dataset. It will roughly generate > 100k data points.

Here are some links:
- LLaMA Duo project repo: https://github.com/deep-diver/llamaduo
- 60k Coding synthetic dataset: chansung/merged_ds_coding
- Fine-tuned Gemma 7B model: chansung/coding_llamaduo_60k_v0.2
posted an update 8 days ago
view post
Post
2382
πŸ’» Smoothing the Transition from Service LLM to Local LLM

Imagine your go-to LLM service is down, or you need to use it offline – yikes! This project is all about having that "Plan B" ready to go. Here's LLaMA Duo I've been building with @sayakpaul :

✨ Fine-tune a smaller LLM: We used Hugging Face's alignment-handbook to teach a smaller-sized LLM to mimic my favorite large language model. Think of it as that super-smart AI assistant getting a capable understudy.

πŸ€– Batch Inference: Let's get that fine-tuned LLM working! My scripts generate lots of text like a champ, and we've made sure things run smoothly even with bigger workloads.

🧐 Evaluation: How well is my small LLM doing? We integrated with the Gemini API to use it as an expert judge – it compares my model's work to the original. Talk about a tough critic!

πŸͺ„ Synthetic Data Generation: Need to boost that model's performance? Using Gemini's feedback, we can create even more training data, custom-made to make the LLM better.

🧱 Building Blocks: This isn't just a one-time thing – it's a toolkit for all kinds of LLMOps work. Want to change your evaluation metrics? Bring in models trained differently? Absolutely, let's make it happen.

Why this project is awesome:

πŸ’ͺ Reliability: Keep things running no matter what happens to your main LLM source.
πŸ”’ Privacy: Process sensitive information on your own terms.
πŸ—ΊοΈ Offline capable: No internet connection? No problem!
πŸ•°οΈ Version Control: Lock in your favorite LLM's behavior, even if the service model changes.

We'm excited to share the code on GitHub. Curious to see what you all think! πŸ‘‰πŸ» https://github.com/deep-diver/llamaduo
replied to their post about 1 month ago
view reply

awesome! going to add one more env var to switch mode then :)

posted an update about 1 month ago
view post
Post
2478
Realize LLM powered idea on Hugging Face Space.

I made Space for you to duplicate, then it comes with Gradio and LLM served by Hugging Face's efficient Text Generation Inference(TGI) framework packed into a single machine.

It provides a sample app code snippet with gr.ChatInterface. However, it is not limited to chat usage, but you can leverage the efficiency of TGI for any sort of apps built in Gradio.

Have you ever enjoyed playing with Hugging Chat? Then, you will enjoy writing your own idea with this. Because both are built on top of TGI!

Focus on your app code, and go beyond chat!

chansung/gradio_together_tgi
  • 2 replies
Β·
posted an update about 2 months ago
view post
Post
πŸŽ₯ 🀾 Vid2Persona: talk to person from video clip

A fun project over the last week with @sayakpaul . It has a simple pipeline from extracting traits of video characters to chatting with them.

Under the hood, this project leverages the power of both commercial and open source models. We used Google's Gemini 1.0 Pro Vision model to understand the video content directly, then we used HuggingFaceH4/zephyr-7b-beta model to make conversation!

Try it Hugging Face Space and let us know what you think.
: chansung/vid2persona

The space application is a dedicated implementation for ZeroGPU environment + Hugging Face Inference API with PRO account. If you wish to host it on your own environment, consider duplicate the space or run locally with the project repository
: https://github.com/deep-diver/Vid2Persona
posted an update about 2 months ago
view post
Post
Updating PaperQA Gradio app and Hugging Face Space.
: Link ➑️ chansung/paper_qa
: Standalone repo ➑️ https://github.com/deep-diver/paperqa-ui

The final goal is to let ppl have their own paper archive. At the end, You will be able to easily *clone* on local or Hugging Face Space with Google's Gemini API Key (which is free), Hugging Face Access Token. You can just drop arXiv IDs at the bottom, then all the auto analyze papers are automatically archived on Hugging Face Dataset repo.

Here are few updates included, and dig in the source code if you want similar features for your use cases!
πŸ–₯️ making complex UI + fully responsive
+ making UI as quickly as possible (avoid server-client when possible)
πŸ’¬ Permanent Chat history management with in-browser local storage
+ Chat history management *per* paper
+ Chat history management in lazy mode (too many paper, impossible to create chat history for every single paper beforehand, hence)

Current plan is to support Gemini and any open source models on Hugging Face PRO account, but will expand it to GPT4 soon.

Any suggestion on this project is welcome! possibly,
- hooking up RAG system (open models' context length is small)
- hooking up Internet search system
- image/figure analysis
....
posted an update about 2 months ago
view post
Post
Understand research papers easier with automatically generated Q&As by LLM (Gemini 1.0 Pro). For this purpose, I have built two projects.

- [Auto Paper Analysis](https://github.com/deep-diver/auto-paper-analysis) let you generate QAs on a list of papers. The paper list could be specified either from Hugging Face's Daily Papers or in a set of raw arXiv IDs. Then the generated QA dataset could be pushed to the Hugging Face Dataset. Refer to the attached image.

- [PaperQA Space application]( chansung/paper_qa) shows how to interact with the generated QA dataset. Search the paper by keyword or date, then understand it with the QAs (in ELI5 and technical versions). Check out the attached video, or visit the space directly.

This is a baby step for the automated paper analysis (summarization) to easily consume the exploding information in the field of AI. In the next phase, I am gonna need spend my time to enhance prompt engineering, UI/UX (such as Like/Dislike system), ...

However, in the meantime, I hope this project could be helpful for someone who struggles on understanding papers (new papers comes out even when I did finish reading a paper from yesterday yet,,)!

Also, any suggestion to improve this, please let me know :)

  • 1 reply
Β·
replied to shivance's post 3 months ago
view reply

Is this an effort to do PyTorch based model on device? I thought Flutter supports other than TFLite

replied to shivance's post 3 months ago
view reply

Had an opportunity to play with Flutter 3 yrs ago with @sayakpaul , and it sure looks pretty interesting

replied to their post 3 months ago
view reply

Thanks @samusenps for sharing the interesting paper.

Yeah it makes a lot of sense to me. I have been working as a professional book translator for few years as a side job. Translators have very little income while they have to spend a lot of time to produce quality translations (no 1:1 mapping. actually it is like recreating the original books).

With that kind of my perspective/experience in translation industries, no doubt about most contents on the web are machine translated.

posted an update 3 months ago
view post
Post
Update on the Newsletter of πŸ€— Daily Paper

Automatic Korean translation is integrated. In the newspaper, "KO" links appear, and it will bring you to the translated version of full paper. This is done with the following workflow.

1. Grasp the list of arXiv IDs from πŸ€— Daily Paper API
2. Distribute a number of sub-list of arXiv IDs to VMs (possibly spot instances since the job ends shortly)
3. Commit & push the translated paper in HTML to the designated GitHub repository
4. Newsletter will include the links to the HTML of each paper

Job distribution to a number of VMs are super easily done with [dstack]( https://dstack.ai/ ), and the translation sub-workflow is done through 1) download PDF of each paper with arxiv-dl package, 2) PDF => text with nougat-ocr package, 3) a custom trained model( nlp-with-deeplearning/enko-t5-small-v0 ) in πŸ€— transformers to translate the English text into Korean line by line, and 4) reformat the translation into HTML.

Many people in Korea are not fluent in English but want to learn about new stuff in AI, so they usually use Google Translate or other services. This is why I made this feature for easier and direct access to the SOTA knowledge.

Are there other countries with the similar needs? If so, it would be wonderful to cooperate to support more languages. Please reach out anyone is interested in this.

PS; I always wanted to show the usefulness of open ML models by building a well working end to end product, and this newsletter shows it by featuring T5ForConditionalGeneration (translation), SOLAR LLM (summarization).

if you want to sub to the newsletter
: https://groups.google.com/g/hf-daily-paper-newsletter

if you want to look into the source codes
: https://github.com/deep-diver/hf-daily-paper-newsletter
Β·
replied to merve's post 3 months ago
view reply

@merve

the link to the GitHub repository is broken (it contains ')' at the end)
thanks for sharing your works by the way!

replied to their post 3 months ago
view reply

@sayakpaul

sounds cool.

  • so we need to pick up some designated cards of our choice
  • run diffusion models on each with the same and different parameters
posted an update 3 months ago
view post
Post
Want to read curated list of papers by @akhaliq in your mail box?

Thanks to the API provided by Hugging Face, I made a simple GitHub Action based newsletter bot to send out πŸ€— Daily Papers. Check out the attached video clip to get a sense of what it is!

Internally, it leverages Gemini API to assign tags for each paper, and all papers are archived by tags and batches. Of course, you can directly go to the papers' pages from your mail box to check out the full paper!

Since everything is automated, GitHub Action and Gemini API are free, and the subscription management is free via Google Groups, this newsletter bot is entirely free. Furthermore, if you wish, you could fork the project for your own newsletter service.

subscription: https://groups.google.com/g/hf-daily-paper-newsletter
project repo: https://github.com/deep-diver/hf-daily-paper-newsletter

In the next step, I will experimentally add auto translation (to Korean) feature for every papers.
  • 4 replies
Β·
replied to their post 4 months ago
view reply

@sayakpaul

I think we can consider using the cheapest yet reasonable alternative. Okay to probably not exhaustively consider all the specs. For example, it won't make much sense to do this using a 4GB card to do SDXL deployment. So, something in the range of 16-24GB should suffice.

sure! what did you mean by the "highest efficiency" by the way? I think I missed some of your points.

replied to their post 4 months ago
view reply

like in terms of different choices in hardware specs, cloud providers, and region, or something?

replied to victor's post 4 months ago
view reply

I would use ph-2 on device for daily conversation and detecting users intention, then pass to cloud hosted much larger LLM to do more complicated stuff (such as web browsing and more).

replied to victor's post 4 months ago
view reply

Not exactly replicatable since the absence of LAM

posted an update 4 months ago
view post
Post
As a GPU poor, I found a nice open source project.

dstackai is the perfect open source project for GPU poor. Simply specify resource requirements (GPU RAM, spot, ...), then will suggest the cheapest options among the popular GPU cloud providers (AWS, GCP, Azure, Lambda Labs, TensorDock, and vast.ai)

Provision VM instances in 3 different use cases. These are the essential for any ML projects.
- Dev: connect provisioned VM instance to your fav IDE (including Jupyter)
- Task: run experiments (training, fine-tuning, ...) via SSH
- Service: run your model in production via HTTPS

dstack is 100% open source, but you need to have your own accounts for each GPU cloud provider, enough GPU quota, configure credentials, etc., all by yourself. Luckily, dstack will announce dstack cloud which let you not worried about all the hassles. The price is almost as same as you directly connect to each cloud with your account.

The attached code snippet shows you how to provision a Mistral-7B model in Text Generation Inference(TGI) on the cheapest VM instance (of having 24GB of VRAM). Then you get the HTTPS connection to it, and play with it as usual with TGI client library as attached in the second code snippet.

If you want to learn more about dstack, check out the official website. Without GPU sponsors, as an individual open source contributor in ML, this kind of project is pretty important.
: https://dstack.ai/

If you are looking for an alternative, there is SkyPilot project as well
: https://github.com/skypilot-org/skypilot
Β·