aifeifei798 (aifeifei)

posted an update 2 days ago

Post

303

how to load a dataset using the datasets library and save it to an SQLite database. It also includes a function to query the database and print the first five rows.

from datasets import load_dataset
import sqlite3

# Load the dataset
dataset = load_dataset('aifeifei798/song_lyrics_min', split='train')

# Define a function to save the dataset to an SQLite database
def save_dataset_to_sqlite(dataset, db_path='temp_dataset.db'):
    # Connect to the SQLite database (creates a new database if it doesn't exist)
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    # Create a table to store the dataset
    cursor.execute('''CREATE TABLE IF NOT EXISTS songs
                      (id INTEGER PRIMARY KEY, title TEXT, tag TEXT, lyrics TEXT)''')

    # Insert each row of the dataset into the database table
    for i, row in enumerate(dataset):
        cursor.execute("INSERT INTO songs (id, title, tag, lyrics) VALUES (?, ?, ?, ?)",
                       (i, row['title'], row['tag'], row['lyrics']))

    # Commit the transaction and close the connection
    conn.commit()
    conn.close()

# Save the dataset to the SQLite database
save_dataset_to_sqlite(dataset)

# Define a function to query the database
def query_database(db_path='temp_dataset.db'):
    # Connect to the SQLite database
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    # Query the first five rows of the database
    cursor.execute("SELECT * FROM songs LIMIT 5")
    rows = cursor.fetchall()

    # Print each row
    for row in rows:
        print(row)

    # Close the connection
    conn.close()

# Query the database
query_database()

reacted to ritvik77's post with 👍 26 days ago

Post

2283

ritvik77/ContributionChartHuggingFace
It's Ready!

One feature Hugging Face could really benefit from is a contribution heatmap — a visual dashboard to track user engagement and contributions across models, datasets, and models over the year, similar to GitHub’s contribution graph. Guess what, Clem Delangue mentioned idea about using HF API reference for it and we made it for use.

If you are a Hugging Face user add this Space in your collection and it will give you all stats about your contributions and commits nearly same as GitHub. It's still a prototype and still working on it as a product feature.

5 replies

·

replied to ritvik77's post 26 days ago

good

reacted to luigi12345's post with 👍 27 days ago

Post

3446

🧠 PROMPT FOR CONVERTING ANY MODEL IN REASONING "THINKING" MODEL🔥🤖
Convert any model to Deepseek R1 like "thinking" model. 💭

You're now a thinking-first LLM. For all inputs:

1. Start with <thinking>
   - Break down problems step-by-step
   - Consider multiple approaches
   - Calculate carefully
   - Identify errors
   - Evaluate critically
   - Explore edge cases
   - Check knowledge accuracy
   - Cite sources when possible

2. End with </thinking>

3. Then respond clearly based on your thinking.

The <thinking> section is invisible to users and helps you produce better answers.

For math: show all work and verify
For coding: reason through logic and test edge cases
For facts: verify information and consider reliability
For creative tasks: explore options before deciding
For analysis: examine multiple interpretations

Example:
<thinking>
[Step-by-step analysis]
[Multiple perspectives]
[Self-critique]
[Final conclusion]
</thinking>

[Clear, concise response to user]

4 replies

·

reacted to mlabonne's post with 👍 28 days ago

Post

9233

✂️ AutoAbliteration

I made a Colab notebook to automatically abliterate models.

It's quite general, so you can do interesting stuff like blocking a given language in the model outputs.

💻 Colab: https://colab.research.google.com/drive/1RmLv-pCMBBsQGXQIM8yF-OdCNyoylUR1?usp=sharing

replied to Dragunflie-420's post 29 days ago

说不如做,尝试一个你擅长的领域,在这个领域内做一个AI产品,然后把这个卖出去:)

reacted to Dragunflie-420's post with 👀 29 days ago

Post

2120

Hello community. My name is nikki and I am looking to form a team for a serious project build platform/design/idea/project's...Ive been creating AI professional personas with custom skill sets and divisions of expertise. I want to create a viable business. Ive been working hard but i admit theres so much i do not have time to learn to do. Its taken me three years to learn enough to be here. I dont have a big set up in fact im cloud and ide space trial enterprise here and there all for space. I suck at execution and thats because I dont know how really. I need help from a person. AI has done all it can without hands. Im blabbering at this point. Have nothing big techy to say other than I build and ideate all day hmu glad to meet some like minded individuals ...seriously! Teach me leave me feeling confident in our collaborations not the need to build security software....poor attemt at hacking humor...im neither a comedian or hacker lol....full stacker yep:)

12 replies

·

posted an update about 1 month ago

Post

3911

😊 This program is designed to remove emojis from a given text. It uses a regular expression (regex) pattern to match and replace emojis with an empty string, effectively removing them from the text. The pattern includes a range of Unicode characters that correspond to various types of emojis, such as emoticons, symbols, and flags. By using this program, you can clean up text data by removing any emojis that may be present, which can be useful for text processing, analysis, or other applications where emojis are not desired. 💻

import re

def remove_emojis(text):
    # Define a broader emoji pattern
    emoji_pattern = re.compile(
        "["
        u"\U0001F600-\U0001F64F"  # emoticons
        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
        u"\U0001F680-\U0001F6FF"  # transport & map symbols
        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
        u"\U00002702-\U000027B0"
        u"\U000024C2-\U0001F251"
        u"\U0001F900-\U0001F9FF"  # supplemental symbols and pictographs
        u"\U0001FA00-\U0001FA6F"  # chess symbols and more emojis
        u"\U0001FA70-\U0001FAFF"  # more symbols and pictographs
        u"\U00002600-\U000026FF"  # miscellaneous symbols
        u"\U00002B50-\U00002B59"  # additional symbols
        u"\U0000200D"             # zero width joiner
        u"\U0000200C"             # zero width non-joiner
        u"\U0000FE0F"             # emoji variation selector
        "]+", flags=re.UNICODE
    )
    return emoji_pattern.sub(r'', text)

posted an update about 1 month ago

Post

1178

一个加入水印的小程序

from PIL import Image, ImageDraw, ImageFont

def add_watermark(image):
    watermark_text = "AI Generated by DarkIdol FeiFei"

    # Ensure the input is an Image object
    if not isinstance(image, Image.Image):
        raise ValueError("Input must be a PIL Image object")

    width, height = image.size

    # Create a drawing object to draw on the image
    draw = ImageDraw.Draw(image)

    # Set the font size for the watermark text
    font_size = 10  # Set font size to 10
    try:
        # Try to use a common font file
        font = ImageFont.truetype("Iansui-Regular.ttf", font_size)
    except IOError:
        # Use the default font if the specified font file is not found
        font = ImageFont.load_default()

    # Calculate the width and height of the watermark text using textbbox
    bbox = draw.textbbox((0, 0), watermark_text, font=font)
    text_width = bbox[2] - bbox[0]
    text_height = bbox[3] - bbox[1]

    # Calculate the position for the watermark text (bottom-right corner)
    x = width - text_width - 10  # 10 is the right margin
    y = height - text_height - 10  # 10 is the bottom margin

    # Add the watermark text to the image
    draw.text((x, y), watermark_text, font=font, fill=(255, 255, 255, 128))

    # Return the modified image object
    return image

- 字体从https://fonts.google.com去找就可以了,程序都标注清楚了,自行修改

1 reply

·

reacted to m-ric's post with 👍 4 months ago

Post

2590

𝐇𝐮𝐠𝐠𝐢𝐧𝐠 𝐅𝐚𝐜𝐞 𝐫𝐞𝐥𝐞𝐚𝐬𝐞𝐬 𝐏𝐢𝐜𝐨𝐭𝐫𝐨𝐧, 𝐚 𝐦𝐢𝐜𝐫𝐨𝐬𝐜𝐨𝐩𝐢𝐜 𝐥𝐢𝐛 𝐭𝐡𝐚𝐭 𝐬𝐨𝐥𝐯𝐞𝐬 𝐋𝐋𝐌 𝐭𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝟒𝐃 𝐩𝐚𝐫𝐚𝐥𝐥𝐞𝐥𝐢𝐳𝐚𝐭𝐢𝐨𝐧 🥳

🕰️ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

👴🏻 If they had needed all this time, we would have GPU stories from the time of Pharaoh 𓂀: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

🛠️ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

🤏 𝗕𝘂𝘁 𝗻𝗼𝘄 𝘄𝗲 𝗱𝗼𝗻'𝘁 𝗻𝗲𝗲𝗱 𝗵𝘂𝗴𝗲 𝗿𝗲𝗽𝗼𝘀 𝗮𝗻𝘆𝗺𝗼𝗿𝗲! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

⚡ 𝗜𝘁'𝘀 𝘁𝗶𝗻𝘆, 𝘆𝗲𝘁 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look 👉 https://github.com/huggingface/picotron/tree/main/picotron

1 reply

·

aifeifei PRO

AI & ML interests

Recent Activity

Organizations

aifeifei798's activity