jharshraj (harsh raj)

updated a model 3 days ago

jharshraj/whisper-accented-speech

Updated 3 days ago

published a model 3 days ago

jharshraj/whisper-accented-speech

Updated 3 days ago

liked 11 datasets 5 months ago

reacted to singhsidhukuldeep's post with 👍 6 months ago

Post

4000

Researchers have developed a novel approach called Logic-of-Thought (LoT) that significantly enhances the logical reasoning capabilities of large language models (LLMs).

Here are the steps on how Logic-of-Thought (LoT) is implemented:

-- 1. Logic Extraction

1. Use Large Language Models (LLMs) to identify sentences containing conditional reasoning relationships from the input context.
2. Generate a collection of sentences with logical relationships.
3. Use LLMs to extract the set of propositional symbols and logical expressions from the collection.
4. Identify propositions with similar meanings and represent them using identical propositional symbols.
5. Analyze the logical relationships between propositions based on their natural language descriptions.
6. Add negation (¬) for propositions that express opposite meanings.
7. Use implication (→) to connect propositional symbols when a conditional relationship exists.

-- 2. Logic Extension

1. Apply logical reasoning laws to the collection of logical expressions from the Logic Extraction phase.
2. Use a Python program to implement logical deduction and expand the expressions.
3. Apply logical laws such as Double Negation, Contraposition, and Transitivity to derive new logical expressions.

-- 3. Logic Translation

1. Use LLMs to translate the newly generated logical expressions into natural language descriptions.
2. Combine the natural language descriptions of propositional symbols according to the extended logical expressions.
3. Incorporate the translated logical information as a new part of the original input prompt.

-- 4. Integration with Existing Prompting Methods

1. Combine the LoT-generated logical information with the original prompt.
2. Use this enhanced prompt with existing prompting methods like Chain-of-Thought (CoT), Self-Consistency (SC), or Tree-of-Thoughts (ToT).
3. Feed the augmented prompt to the LLM to generate the final answer.

What do you think about LoT?

2 replies

·

reacted to nicolay-r's post with 🔥 6 months ago

Post

1037

📢 Having a massive amount of data to bulk the remotely accessed LLM 🤖 with Chain-of-Though (CoT) 🔗 might result in connection loss.
The latter may lead to Python Exception 💥 and challenges with generated content restoration.
To address on this problem, sharing the no-strings / tiny framework that exploits SQLite3 for caching each query.
Such caching allows smooth relaunch in the case of any data loss. ☕
With that, happy to share bulk-chain project and more on that within links below:

⭐ github: https://github.com/nicolay-r/bulk-chain
📦 PyPI: https://pypi.org/project/bulk-chain/

There are three steps to quickstart
(see them in attachment 👇):
✅ 1. Install library
✅ 2. Declare CoT-schema in json file 📄
✅ 3. Wrap your transformer or use existed adapters
https://github.com/nicolay-r/bulk-chain/tree/master/ext

For example, here is the provider for Replicate IO service (https://replicate.com/):
https://github.com/nicolay-r/bulk-chain/blob/master/ext/replicate.py
that supports one of the largers publicly available LLaMA-3.1-405B:
meta-llama/Llama-3.1-405B-Instruct

4 replies

·

replied to nicolay-r's post 6 months ago

I dont understand how this works but i will give it a try.

liked a dataset 7 months ago

mteb/sts12-sts

Viewer • Updated Sep 27, 2022 • 5.34k • 12.3k • 7

liked a dataset 11 months ago

storytracer/US-PD-Books

Viewer • Updated Mar 13, 2024 • 654k • 1.05k • 182

reacted to Pclanglais's post with 🔥 11 months ago

Post

2372

Announcing that we are on our way to solve a long standing issue of document processing: correction of OCR mistakes. Pleias publishes the largest dataset to date with automated OCR correction, 1 billion words in English, French, German and Italian.

OCR quality is long-standing issue of digitization. Cultural heritage texts are especially concerned due to the primary sources being old documents (with many artifacts, blots, degradation) and to the limitation of OCR technology for historical scripts. When we released Common Corpus, a 500 Billion words corpus in the public domain, this was the primary criticism.

Recent breakthrough in post-OCR correction has been made possible thanks to progress in open LLM research and several months of dedicated training and alignment by Pleias as well as the HPC resources from GENCI–IDRIS (Grant 2023-AD011014736) on Jean-Zay.

Announcement: https://huggingface.co/blog/Pclanglais/post-ocr-correction

Post-OCR-Correction dataset: PleIAs/Post-OCR-Correction

reacted to xianbao's post with 🔥 11 months ago

Post

1886

So hard to keep up with pace!!! Lots of new Chinese fine-tunes are being released on HF

So I asked my agent to create a collection
xianbao/llama3-zh-662ba8503bdfe51948a28403

code: https://colab.research.google.com/drive/1ap6fP-VytZE367Nqk26DeQqgQkYaf-cD#scrollTo=eljRbYb4c92M

Would be nice to run then regularly. Any thoughts / suggestions on where to host this cron job?

1 reply

·

harsh raj

AI & ML interests

Recent Activity

Organizations

jharshraj's activity

jharshraj/whisper-accented-speech

jharshraj/whisper-accented-speech

alexandreteles/mental-health-conversational-data

Amod/mental_health_counseling_conversations

Kanakmi/mental-disorders

AnikaBasu/MentalHealthDataset

PrinceAyush/Mental_Health_conv

ilyass31/jkhedri-psychology-llama2-dataset

nbertagnolli/counsel-chat

RaviSheel04/Psychology-Data2

samhog/psychology-RLHF

mpingale/mental-health-chat-dataset

Falah/Mental_health_dataset4Fine_Tuning

mteb/sts12-sts

storytracer/US-PD-Books