mammour (Margaux Ammour)

liked a model 18 days ago

deepcogito/cogito-v1-preview-qwen-32B

Text Generation • Updated 19 days ago • 7.39k • 105

reacted to sr-rai's post with 🤗 19 days ago

Post

2653

ExLlamaV3 is out. And it introduces EXL3 - a new SOTA quantization format!

"The conversion process is designed to be simple and efficient and requires only an input model (in HF format) and a target bitrate. By computing Hessians on the fly and thanks to a fused Viterbi kernel, the quantizer can convert a model in a single step, taking a couple of minutes for smaller models, up to a few hours for larger ones (70B+) (on a single RTX 4090 or equivalent GPU.)"

Repo: https://github.com/turboderp-org/exllamav3

1 reply

·

upvoted a paper 20 days ago

Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness

Paper • 2310.02410 • Published Oct 3, 2023 • 3

liked 2 Spaces 27 days ago

178

MV Adapter I2MV SDXL

🌍

Generate multi-view images from a single image

691

TripoSG

🔮

Generate a 3D model from an image

reacted to etemiz's post with 👍👍 about 1 month ago

Post

2830

My 1 year of work summarized.

TLDR: by carefully curating datasets we can fix misinformation in AI. Then we can use that to measure misinformation in other AI.

https://huggingface.co/blog/etemiz/building-a-beneficial-ai

liked a model about 1 month ago

lucyknada/Mistral-Small-3.1-24B-Instruct-2503-HF-exl2

Updated Mar 17 • 5 • 4

liked a Space about 2 months ago

28

Mistral OCR

📚

Experiment with Mistral OCR

liked a model 2 months ago

ReadyArt/Skyfall-36B-v2_EXL2_3.75bpw_H8

Updated Feb 17 • 7 • 1

liked a model 3 months ago

lerobot/pi0

Robotics • Updated Mar 6 • 12.9k • 229

commented on syncIAL🍏: A Multi-Purpose Synthetic Debate and Argument Mapping Corpus 3 months ago

Following your exemple :
Your FoF-2 = FoF-1, as it stand there it biases the dataset by overponderating/oversaturating the same argument as two different ones.

https://argdown.org/syntax/#equivalence-classes

They should look like that :

<Focus on Fundamentals>: Restricting access to fan fiction and social media in schools allows students to prioritize core academic subjects and develop a solid foundation in STEM fields, literature, and critical thinking.
<Focus on Fundamentals>: By limiting access to non-academic online content, schools can redirect students' attention to foundational subjects, fostering a stronger understanding of complex concepts and better retention of critical information.

leading to the following :

[Learning Over Leisure]: Schools should restrict students' access to fan fiction and social media to protect the integrity of education. 
    <- <Restriction Infringes on Freedom of Expression>: Restricting access to fan fiction and social media unconstitutionally limits students' right to freedom of expression and stifles their creativity.
        <+ <Lifelong Learning>: By exercising their freedom of expression, students develop essential skills in critical thinking, problem-solving, and effective communication, preparing them for success in their future careers and personal lives.
        <- <Echo Chamber Effect>: Exercising freedom of expression in an unstructured environment can create an echo chamber where students only communicate with like-minded individuals, failing to develop the skills to engage with diverse perspectives and opposing views.
            <- <Silent Observer>: Developing skills to engage with diverse perspectives and opposing views is not essential for effective communication in situations where listening and observing, rather than actively engaging, is the most effective strategy.
        <- <Fan Fiction Distortion>: Fan fiction and social media often distort students' creativity by promoting unoriginal and copyrighted content, rather than fostering genuine artistic expression.
            <- <Artistic Evolution>: The value of artistic expression lies in its ability to evoke emotions and spark new ideas, regardless of whether it is original or builds upon existing works, making the distinction between original and unoriginal content irrelevant.
        <+ <Innovation Incubator>: Unrestricted freedom of expression enables students to develop critical thinking, problem-solving, and communication skills, essential for academic and professional success.
    <+ <Focus on Fundamentals>: Restricting access to fan fiction and social media in schools allows students to prioritize core academic subjects and develop a solid foundation in STEM fields, literature, and critical thinking.
    <+ <Focus on Fundamentals>: By limiting access to non-academic online content, schools can redirect students' attention to foundational subjects, fostering a stronger understanding of complex concepts and better retention of critical information.
        <+ <Knowledge Pyramid>: A strong grasp of foundational subjects allows students to recognize relationships between different ideas and concepts, creating a hierarchical structure of knowledge that enhances retention and recall of critical information.

Problem solved, now we need to fix the dataset :

Pass all jsons trough :

#!/usr/bin/env python3
"""
Script to fix “almost duplicated” labels in a debate JSON.
It reads an input JSON file (with a “nodes” array where each node has a “label”),
finds labels that are very similar (according to a fuzzy–match threshold),
and then updates all such nodes to share a canonical label.
"""

import json
import sys
import logging
import argparse
from difflib import SequenceMatcher
from typing import List, Dict, Any

# Set up logging configuration
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

def similarity(a: str, b: str) -> float:
    """Return a similarity ratio between two strings (0 to 1)."""
    return SequenceMatcher(None, a, b).ratio()

def cluster_labels(labels: List[str], threshold: float = 0.90) -> Dict[str, str]:
    """
    Given a list of labels, return a dictionary mapping each label to a canonical label.
    Two labels that are at least 'threshold' similar will be treated as duplicates.
    (The first label encountered becomes the canonical version.)
    """
    canonical: Dict[str, str] = {}
    unique_labels = list(set(labels))  # unique labels in no particular order
    unique_labels.sort()  # sort for consistency

    # Build clusters by iterating over the unique labels.
    for i, label in enumerate(unique_labels):
        if label in canonical:
            continue
        canonical[label] = label  # label becomes its own canonical version
        for other_label in unique_labels[i + 1:]:
            if other_label in canonical:
                continue
            if similarity(label, other_label) >= threshold:
                canonical[other_label] = label
    return canonical

def fix_labels(data: Dict[str, Any], threshold: float = 0.90) -> Dict[str, Any]:
    """
    Given a debate JSON object (with a "nodes" key), fix labels by unifying similar ones.
    Returns the modified JSON object.
    """
    if "nodes" not in data:
        logging.error("No 'nodes' key found in JSON data.")
        return data

    nodes = data["nodes"]
    if not isinstance(nodes, list):
        logging.error("'nodes' should be a list.")
        return data

    # Extract all labels; if a node doesn't have a "label", default to an empty string.
    labels = [node.get("label", "") for node in nodes if isinstance(node, dict)]
    
    # Build mapping from each label to its canonical version.
    mapping = cluster_labels(labels, threshold=threshold)
    logging.info("Found %d unique labels; mapping to canonical labels:", len(mapping))
    for key, canonical_label in mapping.items():
        if key != canonical_label:
            logging.info("  %r --> %r", key, canonical_label)

    # Update each node's label using the mapping.
    for node in nodes:
        if isinstance(node, dict):
            original_label = node.get("label", "")
            if original_label in mapping:
                node["label"] = mapping[original_label]
    return data

def parse_args() -> argparse.Namespace:
    """Parse command-line arguments."""
    parser = argparse.ArgumentParser(
        description="Fix almost duplicated labels in a debate JSON file."
    )
    parser.add_argument("input_file", help="Path to the input JSON file.")
    parser.add_argument("output_file", help="Path where the fixed JSON will be saved.")
    parser.add_argument(
        "--threshold", type=float, default=0.90,
        help="Fuzzy matching threshold (default: 0.90)."
    )
    return parser.parse_args()

def main() -> None:
    args = parse_args()

    # Load JSON data from file with error handling.
    try:
        with open(args.input_file, "r", encoding="utf-8") as infile:
            data = json.load(infile)
    except FileNotFoundError:
        logging.error("Input file '%s' not found.", args.input_file)
        sys.exit(1)
    except json.JSONDecodeError as e:
        logging.error("Error decoding JSON from '%s': %s", args.input_file, e)
        sys.exit(1)
    except Exception as e:
        logging.error("An unexpected error occurred while reading '%s': %s", args.input_file, e)
        sys.exit(1)

    # Fix labels in the data.
    fixed_data = fix_labels(data, threshold=args.threshold)

    # Write the fixed data to the output file with error handling.
    try:
        with open(args.output_file, "w", encoding="utf-8") as outfile:
            json.dump(fixed_data, outfile, indent=2, ensure_ascii=False)
    except Exception as e:
        logging.error("An error occurred while writing to '%s': %s", args.output_file, e)
        sys.exit(1)

    logging.info("Fixed JSON written to '%s'", args.output_file)

if __name__ == "__main__":
    main()

with https://huggingface.co/datasets/DebateLabKIT/syncialo-raw/raw/main/data/synthetic_corpus-001/train/debate-train-0444/node_link_data-debate-train-0444.json

we get this stdo :

λ python fix_labels.py input.json output.json
INFO: Found 638 unique labels; mapping to canonical labels:
INFO:   'Algorithmic Bias Amplification' --> 'Algorithmic Amplification'
INFO:   'Biased Benchmarks' --> 'Biased Benchmark'
INFO:   'Crime Deterrent' --> 'Crime Deterrence'
INFO:   'Dataset Augmentation' --> 'Data Augmentation'
INFO:   'Data Deserts' --> 'Data Desert'
INFO:   'Diverse Datasets' --> 'Diverse Data Sets'
INFO:   'Surveillance Slippery Slope' --> 'Mass Surveillance Slippery Slope'
INFO:   'National Security Exemption' --> 'National Security Exception'
INFO:   'Protecting the Vulnerable:' --> 'Protecting the Vulnerable'
INFO:   'Redundant Safeguards' --> 'Redundancy Safeguard'
INFO: Fixed JSON written to 'output.json'

all you need to do is to adapt main and make a pass through. atm your dataset is bad practice.

Credits : me, argdown docs, AI for [code review] and [error handling].

New activity in mistralai/Mistral-Small-24B-Instruct-2501 3 months ago

Mistral Small 24 B

4

#19 opened 3 months ago by

HandsomeMagyar

Why increase censorship?

21

#20 opened 3 months ago by

notafraud

liked 2 models 3 months ago

DeusImperator/Mistral-Small-24B-Instruct-2501_exl2_6.5bpw_L

Updated Jan 31 • 9 • 6

mistralai/Mistral-Small-24B-Instruct-2501

Text Generation • Updated Feb 2 • 925k • • 900

liked a Space 5 months ago

6

Pixtral 12B EXL2

👀

A demo chatbot using Pixtral 12B with ExllamaV2!

New activity in Qwen/QwQ-32B-Preview 5 months ago

multi GPU inferencing

2

#18 opened 5 months ago by

cjj2003

New activity in Jacoby746/Casual-Magnum-34B-exl2-4.0bpw 7 months ago

Error during inference

7

#1 opened 7 months ago by

Jellon

reacted to singhsidhukuldeep's post with 👍 7 months ago

Post

4002

Researchers have developed a novel approach called Logic-of-Thought (LoT) that significantly enhances the logical reasoning capabilities of large language models (LLMs).

Here are the steps on how Logic-of-Thought (LoT) is implemented:

-- 1. Logic Extraction

1. Use Large Language Models (LLMs) to identify sentences containing conditional reasoning relationships from the input context.
2. Generate a collection of sentences with logical relationships.
3. Use LLMs to extract the set of propositional symbols and logical expressions from the collection.
4. Identify propositions with similar meanings and represent them using identical propositional symbols.
5. Analyze the logical relationships between propositions based on their natural language descriptions.
6. Add negation (¬) for propositions that express opposite meanings.
7. Use implication (→) to connect propositional symbols when a conditional relationship exists.

-- 2. Logic Extension

1. Apply logical reasoning laws to the collection of logical expressions from the Logic Extraction phase.
2. Use a Python program to implement logical deduction and expand the expressions.
3. Apply logical laws such as Double Negation, Contraposition, and Transitivity to derive new logical expressions.

-- 3. Logic Translation

1. Use LLMs to translate the newly generated logical expressions into natural language descriptions.
2. Combine the natural language descriptions of propositional symbols according to the extended logical expressions.
3. Incorporate the translated logical information as a new part of the original input prompt.

-- 4. Integration with Existing Prompting Methods

1. Combine the LoT-generated logical information with the original prompt.
2. Use this enhanced prompt with existing prompting methods like Chain-of-Thought (CoT), Self-Consistency (SC), or Tree-of-Thoughts (ToT).
3. Feed the augmented prompt to the LLM to generate the final answer.

What do you think about LoT?

2 replies

·

Margaux Ammour

AI & ML interests

Recent Activity

Organizations

mammour's activity

deepcogito/cogito-v1-preview-qwen-32B

Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness

MV Adapter I2MV SDXL

TripoSG

lucyknada/Mistral-Small-3.1-24B-Instruct-2503-HF-exl2

Mistral OCR

ReadyArt/Skyfall-36B-v2_EXL2_3.75bpw_H8

lerobot/pi0

Mistral Small 24 B

Why increase censorship?

DeusImperator/Mistral-Small-24B-Instruct-2501_exl2_6.5bpw_L

mistralai/Mistral-Small-24B-Instruct-2501

Pixtral 12B EXL2

multi GPU inferencing

Error during inference