Open-Source AI Cookbook documentation

๐Ÿค– ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ์—์ด์ „ํŠธ: ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ๋ฅผ ๋ฐฑ์—”๋“œ๋กœ ํ•˜๋Š” ์ง€๋Šฅํ˜• ๊ฒ€์ƒ‰ ์—”์ง„

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Open In Colab

๐Ÿค– ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ์—์ด์ „ํŠธ: ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ๋ฅผ ๋ฐฑ์—”๋“œ๋กœ ํ•˜๋Š” ์ง€๋Šฅํ˜• ๊ฒ€์ƒ‰ ์—”์ง„

์ฐธ์กฐ: Martin Elstner ์ž‘์„ฑ์ž: ์•ˆ์ •

๊ฒ€์ƒ‰ ์—”์ง„์€ ํฌ๊ฒŒ ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰๊ณผ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰๊ณผ ๋‹ฌ๋ฆฌ, ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์„ ์‚ฌ์šฉํ•  ๋•Œ๋Š” ๋‘ ๊ฐ€์ง€๋ฅผ ๊ณ ๋ คํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  1. ์ ํ•ฉํ•œ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ๋กœ ๋ฐ์ดํ„ฐ์…‹๊ณผ ์ฟผ๋ฆฌ๋ฅผ ์ž„๋ฒ ๋”ฉํ•˜๋Š” ์ž‘์—…
  2. ์ž„๋ฒ ๋”ฉ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” DB

ํ•˜์ง€๋งŒ ์ž„๋ฒ ๋”ฉ ๊ฐ’์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰๋งŒ์œผ๋กœ๋Š” โ€˜์‚ฌ์šฉ์ž๊ฐ€ ์›ํ•˜๋Š” ๋‹ต๋ณ€โ€™์„ ๋ณด์žฅํ•˜๊ธฐ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

๊ทธ๋ž˜์„œ ๊ฒ€์ƒ‰ ๊ณผ์ •์—์„œ ์—์ด์ „ํŠธ๊ฐ€ ์ž์œจ์ ์œผ๋กœ ํŒ๋‹จํ•˜๊ณ  ์ตœ์ ํ™”ํ•œ๋‹ค๋ฉด, ์‚ฌ์šฉ์ž ์˜๋„์— ๋” ๊ฐ€๊นŒ์šด ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Agentic ์ ‘๊ทผ ๋ฐฉ์‹์˜ ์ฐจ๋ณ„์ 

๊ธฐ์กด ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ์›Œํฌํ”Œ๋กœ์šฐ

๋ฐ์ดํ„ฐ โžก ๋ฐ์ดํ„ฐ ์ž„๋ฒ ๋”ฉ(๊ณ ์ • ๋ชจ๋ธ) โžก ์ธ๋ฑ์Šค ์ƒ์„ฑ โžก ์‚ฌ์šฉ์ž ์งˆ์˜ โžก ์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰ โžก ๋‹ต๋ณ€

๋˜‘๋˜‘ํ•œ Agentic ๋ฐฉ์‹

์‚ฌ์šฉ์ž ์งˆ์˜ ๋ถ„์„(๊ฒ€์ƒ‰ ์ „๋žต ์„ธ์›€) โžก ์ตœ์  ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ ์„ ํƒ โžก ๋ฐ์ดํ„ฐ ์ž„๋ฒ ๋”ฉ โžก ์ธ๋ฑ์Šค ์ƒ์„ฑ โžก ์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰ โžก ๊ฒ€์ƒ‰๊ฒฐ๊ณผ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ต๋ณ€ ์ •์ œ

DuckDB?

ํ—ˆ๊น…ํŽ˜์ด์Šค์˜ ๋ฐ์ดํ„ฐ์…‹์€ ํŒŒ์ผ“(parquet) ํŒŒ์ผ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋™์ž‘ํ•˜๋Š”๋ฐ, ์ด๋•Œ ๋น ๋ฅธ ์ธ๋ฉ”๋ชจ๋ฆฌ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์‹œ์Šคํ…œ์ธ DuckDB๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์ด ํŒŒ์ผ๋“ค๊ณผ ์ƒํ˜ธ์ž‘์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ DuckDB์˜ ๊ธฐ๋Šฅ ์ค‘ ํ•˜๋‚˜๋Š” ๋ฒกํ„ฐ ์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰์œผ๋กœ, ์ธ๋ฑ์Šค ์œ ๋ฌด์— ๊ด€๊ณ„์—†์ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋ฒˆ ๋…ธํŠธ๋ถ์—์„œ๋Š” ๋‹จ์ผ Agent์— ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋„๊ตฌ๋ฅผ ์ฃผ์–ด ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฐ„๋‹จํ•œ Agentic ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ์—”์ง„์„ ๋งŒ๋“ค์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

ํ•„์š”ํ•œ ์˜์กด์„ฑ์„ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค :

# ๋ณธ ์˜ˆ์ œ ํŒŒ์ผ์€ Python 3.10 ์ด์ƒ ๋ฒ„์ „์—์„œ๋งŒ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
!pip install -U smolagents datasets sentence-transformers duckdb openai

HuggingFace์˜ ์ถ”๋ก  API๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ๋กœ๊ทธ์ธํ•ฉ๋‹ˆ๋‹ค :

from huggingface_hub import notebook_login

notebook_login()
from smolagents import tool
from datasets import Dataset
import os

# ๋„๊ตฌ ์‚ฌ์šฉ์„ ์œ„ํ•ด OPENAI ํ‚ค๋ฅผ ๋ฐœ๊ธ‰ ๋ฐ›์•„์•ผํ•ฉ๋‹ˆ๋‹ค.
os.environ["OPENAI_API_KEY"] = "YOUR KEY"

๋„๊ตฌ ์ •์˜

์ •์˜ํ•  ๋„๊ตฌ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ์ž„๋ฒ ๋”ฉ ์ƒ์„ฑ ๋„๊ตฌ
  • ์ธ๋ฑ์Šค ์ƒ์„ฑ ๋„๊ตฌ
  • ์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰ ๋„๊ตฌ
  • ๋‹ต๋ณ€ ์ƒ์„ฑ ๋„๊ตฌ

๋„๊ตฌ1 : ์ž„๋ฒ ๋”ฉ ์ƒ์„ฑ

์ผ๋ฐ˜์ ์œผ๋กœ, ์ž„๋ฒ ๋”ฉ ์ž‘์—…์—์„œ๋Š” ์ž‘์€ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒญํ‚นํ•˜์ง€๋งŒ ์—ฌ๊ธฐ์„œ๋Š” ๋‹จ์ˆœํžˆ ๋ฐ์ดํ„ฐ์…‹์„ ์ž„๋ฒ ๋”ฉ์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ณผ์ •๋งŒ ์ˆ˜ํ–‰ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

@tool
def create_embeddings(
    dataset: Dataset,
    model_id: str,
    column_name: str,
    ) -> Dataset:
  """
    ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

    Args:
        dataset: ์ž„๋ฒ ๋”ฉ์„ ์ƒ์„ฑํ•  ๋Œ€์ƒ ๋ฐ์ดํ„ฐ์…‹
        model_id: ์ž„๋ฒ ๋”ฉ์— ์‚ฌ์šฉํ•  ๋ชจ๋ธ
        column_name: ์ž„๋ฒ ๋”ฉํ•  ์—ด ์ด๋ฆ„

    Returns:
        ์ž„๋ฒ ๋”ฉ์ด ์ถ”๊ฐ€๋œ ๋ฐ์ดํ„ฐ์…‹
  """
  from sentence_transformers import SentenceTransformer

  model = SentenceTransformer(model_id)

  def embed_batch(batch):
    embeddings = model.encode(batch[column_name], convert_to_numpy=True)
    batch["embeddings"] = embeddings.tolist()
    return batch

  dataset = dataset.map(embed_batch, batched=True)

  return dataset

DuckDB๋กœ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ์ˆ˜ํ–‰ํ•˜๊ธฐ

duckdb๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ธ๋ฑ์Šค๋Š” ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ์‚ฌ์šฉํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ธ๋ฑ์Šค๋ฅผ ํ™œ์šฉํ•˜์ง€ ์•Š๊ณ  ๊ฒ€์ƒ‰ํ•˜๋Š” ๊ฒƒ์€ ๋” ๋А๋ฆฌ์ง€๋งŒ ๋” ์ •ํ™•ํ•˜๊ณ , ์ธ๋ฑ์Šค๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰ํ•˜๋Š” ๊ฒƒ์€ ๋” ๋น ๋ฅด์ง€๋งŒ ๋œ ์ •ํ™•ํ•ฉ๋‹ˆ๋‹ค.

์ธ๋ฑ์Šค๋ฅผ ํ™œ์šฉํ•˜์ง€ ์•Š๊ณ  ๊ฒ€์ƒ‰ํ•˜๊ธฐ

์ธ๋ฑ์Šค๋ฅผ ํ™œ์šฉํ•˜์ง€ ์•Š๊ณ  ๊ฒ€์ƒ‰ํ•˜๋Š” ๊ฒƒ์€ ๋А๋ฆฐ ์ž‘์—…์ง€๋งŒ ์ผ๋ฐ˜์ ์œผ๋กœ ์•ฝ 10๋งŒ ํ–‰๊นŒ์ง€์˜ ์ž‘์€ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋Š” ์ถฉ๋ถ„ํžˆ ๋น ๋ฅด๊ฒŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋ฒˆ ๋…ธํŠธ๋ถ์—์„œ๋Š” DuckDB์˜ ์ธ๋ฑ์Šค๋ฅผ ํ™œ์šฉํ•ด์„œ ๊ฒ€์ƒ‰ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์ธ๋ฑ์Šค๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ฒ€์ƒ‰ํ•˜๊ธฐ

์ด ์ ‘๊ทผ๋ฒ•์€ ๋ฐ์ดํ„ฐ์…‹์˜ ๋กœ์ปฌ ๋ณต์‚ฌ๋ณธ์„ ์ƒ์„ฑํ•˜๊ณ  ์ด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธ๋ฑ์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์•ฝ๊ฐ„์˜ ์˜ค๋ฒ„ํ—ค๋“œ๊ฐ€ ์žˆ์ง€๋งŒ ํ•œ๋ฒˆ ์ธ๋ฑ์Šค๋ฅผ ์ƒ์„ฑํ•œ ํ›„์—๋Š” ๊ฒ€์ƒ‰ ์†๋„๊ฐ€ ๊ฐœ์„ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋„๊ตฌ2 : DuckDB ์ธ๋ฑ์Šค ๋งŒ๋“ค๊ธฐ

@tool
def create_db_index(
    dataset_with_embeddings: Dataset,  # ์ž„๋ฒ ๋”ฉ์ด ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ์…‹
    table_name: str,
    embedding_column: str = "embeddings"
) -> None:
    """
    ์ž„๋ฒ ๋”ฉ์ด ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด DuckDB ์ธ๋ฑ์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

    Args:
        dataset_with_embeddings: ์ด๋ฏธ ์ž„๋ฒ ๋”ฉ์ด ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ์…‹
        table_name: ์ƒ์„ฑํ•  ํ…Œ์ด๋ธ” ์ด๋ฆ„
        embedding_column: ์ž„๋ฒ ๋”ฉ ์—ด ์ด๋ฆ„

    Returns:
        None
    """
    import duckdb
    
    # VSS ํ™•์žฅ ์„ค์น˜ ๋ฐ ๋กœ๋“œ
    duckdb.sql("INSTALL vss; LOAD vss;")
    duckdb.sql(f"DROP TABLE IF EXISTS {table_name};")

    # ๋ฐ์ดํ„ฐ์…‹์„ pandas DataFrame์œผ๋กœ ๋ณ€ํ™˜
    df = dataset_with_embeddings.to_pandas()

    # DuckDB์— DataFrame ๋“ฑ๋ก
    duckdb.register(f"{table_name}_temp", df)

    # ๋ชจ๋ธ์—์„œ ์ž„๋ฒ ๋”ฉ ์ฐจ์› ๊ฐ€์ ธ์˜ค๊ธฐ
    embedding_dim = len(df[embedding_column].iloc[0])
    # embedding_dim = model.get_sentence_embedding_dimension()

    # ํ…Œ์ด๋ธ” ์ƒ์„ฑ (์ž„๋ฒ ๋”ฉ์„ FLOAT ๋ฐฐ์—ด๋กœ ๋ณ€ํ™˜)
    duckdb.sql(f"""
        CREATE TABLE {table_name} AS
        SELECT *, {embedding_column}::FLOAT[{embedding_dim}] AS {embedding_column}_float
        FROM {table_name}_temp;
    """)

    # HNSW ์ธ๋ฑ์Šค ์ƒ์„ฑ
    duckdb.sql(f"""
        CREATE INDEX idx_{embedding_column} ON {table_name}
        USING HNSW ({embedding_column}_float) WITH (metric = 'cosine');
    """)

    # ์ž„์‹œ ํ…Œ์ด๋ธ” ์ •๋ฆฌ
    duckdb.sql(f"DROP VIEW IF EXISTS {table_name}_temp;")

์ด ๋„๊ตฌ๋ฅผ ํ†ตํ•ด ์ธ๋ฑ์Šค๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ฒฐ๊ณผ๋Š” ์ฆ‰์‹œ ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค.

๋„๊ตฌ3 : ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ์ˆ˜ํ–‰ํ•˜๊ธฐ

@tool
def similarity_search_with_duckdb_index(
    query: str,
    table_name: str,
    model_id: str,
    k: int = 5,
    embedding_column: str = "embeddings"
)-> dict:
    """
    DuckDB ์ธ๋ฑ์Šค๋ฅผ ์ด์šฉํ•ด ๋ฒกํ„ฐ ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

    Args:
        query: ๊ฒ€์ƒ‰ํ•  ์ฟผ๋ฆฌ ๋ฌธ์ž์—ด
        model_id: ์ž„๋ฒ ๋”ฉ์— ์‚ฌ์šฉํ•  ๋ชจ๋ธ
        k: ๋ฐ˜ํ™˜ํ•  ๊ฒฐ๊ณผ ์ˆ˜
        table_name: ๊ฒ€์ƒ‰ํ•  ํ…Œ์ด๋ธ” ์ด๋ฆ„
        embedding_column: ์ž„๋ฒ ๋”ฉ ์ปฌ๋Ÿผ ์ด๋ฆ„

    Returns:
        dict: ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ
    """
    from sentence_transformers import SentenceTransformer
    import duckdb
    
    model = SentenceTransformer(model_id)
    embedding = model.encode(query).tolist()
    return duckdb.sql(
        query=f"""
        SELECT *, array_cosine_distance({embedding_column}_float, {embedding}::FLOAT[{model.get_sentence_embedding_dimension()}]) as distance
        FROM {table_name}
        ORDER BY distance
        LIMIT {k};
    """
    ).to_df()

๋ฌด๊ฑฐ์šด ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ์—”์ง„์„ ๋”ฐ๋กœ ๋ฐฐํฌํ•  ํ•„์š” ์—†๊ณ , ์ €์žฅ์†Œ๋Š” ํ—ˆ๋ธŒ์—์„œ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค.

๋„๊ตฌ4 : ๋‹ต๋ณ€ ์ƒ์„ฑ ๋„๊ตฌ

์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ์ฒญํฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ, LLM์ด ์‚ฌ์šฉ์ž๊ฐ€ ์›ํ•  ๋งŒํ•œ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

@tool
def generate_answer(chunks: list, query: str) -> str:
    """
    ์ฟผ๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ฃผ์–ด์ง„ ํ…์ŠคํŠธ ์ฒญํฌ ๋ชฉ๋ก์—์„œ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

    Args:
        chunks: ๋‹ต๋ณ€ ์ƒ์„ฑ์— ์‚ฌ์šฉํ•  ํ…์ŠคํŠธ ์ฒญํฌ ๋ชฉ๋ก
        query: ๋‹ต๋ณ€ํ•  ์ฟผ๋ฆฌ ๋ฌธ์ž์—ด

    Returns:
        str: ์ƒ์„ฑ๋œ ๋‹ต๋ณ€
    """
    import openai   # OPENAI ํ‚ค ๋ฐœ๊ธ‰์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
    
    context = "\n\n".join(chunks)
    prompt = f"Context:\n{context}\n\nQuestion: {query}\nAnswer:"
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    return response["choices"][0]["message"]["content"]

๋„๊ตฌ๋ฅผ ์ •์˜ํ•˜์˜€์œผ๋‹ˆ, ๋ฐ์ดํ„ฐ์…‹์„ ๋กœ๋“œํ•˜๊ณ  ์—์ด์ „ํŠธ๋ฅผ ๋™์ž‘ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹์€ huggingface-KREW/KoCultre-Descriptions์œผ๋กœ, ํ•œ๊ตญ์–ด ๋ฐˆ(meme)์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค.

from datasets import load_dataset

dataset = load_dataset("huggingface-KREW/KoCultre-Descriptions")

์—์ด์ „ํŠธ๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์‚ฌ์šฉํ•  ๋ชจ๋ธ์€ Qwen/Qwen2.5-Coder-32B-Instruct ์ž…๋‹ˆ๋‹ค.

from smolagents import CodeAgent, InferenceClientModel

model = InferenceClientModel(
    "Qwen/Qwen2.5-Coder-32B-Instruct",
    provider="together",
    max_tokens=2048
)

tools = [
    create_embeddings,                    # ์ž„๋ฒ ๋”ฉ ์ƒ์„ฑ ๋„๊ตฌ
    create_db_index,                      # ์ธ๋ฑ์Šค ์ƒ์„ฑ ๋„๊ตฌ
    similarity_search_with_duckdb_index,  # ์œ ์‚ฌ๋„ ๊ฒ€์ƒ‰ ๋„๊ตฌ
    generate_answer                       # ๋‹ต๋ณ€ ์ƒ์„ฑ ๋„๊ตฌ
]

agent = CodeAgent(
    model=model,
    tools=tools,
    verbosity_level=2,
    max_steps=15
)

์—์ด์ „ํŠธ๊ฐ€ ์ž์œจ์ ์œผ๋กœ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์„ ํƒํ•  ์ˆ˜ ์žˆ๋„๋ก ํ”„๋กฌํ”„ํŠธ๋ฅผ ๊ตฌ์„ฑํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

def agentic_prompt(query: str):
  return f"""
      ๋‹น์‹ ์€ ์ง€๋Šฅํ˜• ๊ฒ€์ƒ‰ ์ „๋ฌธ๊ฐ€์ž…๋‹ˆ๋‹ค. {query}:

      ๊ฒ€์ƒ‰์„ ์œ„ํ•ด ๋จผ์ € ์ฟผ๋ฆฌ์™€ ๋ฐ์ดํ„ฐ์…‹์„ ๋ถ„์„ํ•˜๊ณ , ๋ถ„์„ ๊ฒฐ๊ณผ์— ๋”ฐ๋ผ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์„ ์„ ํƒํ•˜์„ธ์š”
        - sentence-transformers/all-MiniLM-L6-v2 (fast, lightweight, good for simple queries)
        - sentence-transformers/all-mpnet-base-v2 (balanced performance)
        - intfloat/e5-large-v2 (high quality for complex tasks)
        - minishlab/potion-base-8M (very efficient)
      """

๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ์„ค๋ช…๊ณผ, ๊ฒ€์ƒ‰์–ด๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ๊ฒ€์ƒ‰์—”์ง„ ์—์ด์ „ํŠธ๋ฅผ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

result = agent.run(
    agentic_prompt(query="์•„์ด๋Œ ๊ด€๋ จ ๋ฐˆ ์•Œ๋ ค์ฃผ์„ธ์š”"),
    additional_args={"dataset": dataset,
                     "dataset_description": "ํ•œ๊ตญ์–ด ๋ฐˆ(meme)์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ, meme๊ณผ(title), meme์˜ ๋œป(content)์„ ์•Œ๋ ค์ค€๋‹ค.",
                     "column_name": "content"}
)
>>> print(f"Final result: {result}")
Final result: ์—„๋งˆ๊ฐ€๋˜๋Š” ๋ฐˆ์€ ์ตœ๊ทผ SNS์™€ ์˜จ๋ผ์ธ ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ์œ ํ–‰ํ•˜๋Š” ํ‘œํ˜„์œผ๋กœ, ์ฃผ๋กœ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋งค์šฐ ๊ท€์—ฝ๊ฑฐ๋‚˜ ์‚ฌ๋ž‘์Šค๋Ÿฌ์›Œ์„œ ๋ชจ์„ฑ์• ์  ๊ฐ์ •์„ ๋А๋‚„ ๋•Œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด ํ‘œํ˜„์€ ์•„์ด๋Œ ํŒฌ๋ค ๋ฌธํ™”์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋ฉฐ, '์—„๋งˆ๊ฐ€ ๋ผ'๊ฐ€ ์˜ฌ๋ฐ”๋ฅธ ํ‘œ๊ธฐ์ด์ง€๋งŒ ์˜๋„์ ์ธ ๋งž์ถค๋ฒ• ํŒŒ๊ดด๋ฅผ ํ†ตํ•ด ๋ฐˆ์œผ๋กœ์„œ์˜ ๋…ํŠนํ•œ ์„ฑ๊ฒฉ์„ ๊ฐ–๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
Final result: ์—„๋งˆ๊ฐ€๋˜๋Š” ๋ฐˆ์€ ์ตœ๊ทผ SNS์™€ ์˜จ๋ผ์ธ ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ์œ ํ–‰ํ•˜๋Š” ํ‘œํ˜„์œผ๋กœ, ์ฃผ๋กœ ๋ˆ„๊ตฐ๊ฐ€๊ฐ€ ๋งค์šฐ ๊ท€์—ฝ๊ฑฐ๋‚˜ ์‚ฌ๋ž‘์Šค๋Ÿฌ์›Œ์„œ ๋ชจ์„ฑ์• ์  ๊ฐ์ •์„ ๋А๋‚„ ๋•Œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด ํ‘œํ˜„์€ ์•„์ด๋Œ ํŒฌ๋ค ๋ฌธํ™”์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋ฉฐ, '์—„๋งˆ๊ฐ€ ๋ผ'๊ฐ€ ์˜ฌ๋ฐ”๋ฅธ ํ‘œ๊ธฐ์ด์ง€๋งŒ ์˜๋„์ ์ธ ๋งž์ถค๋ฒ• ํŒŒ๊ดด๋ฅผ ํ†ตํ•ด ๋ฐˆ์œผ๋กœ์„œ์˜ ๋…ํŠนํ•œ ์„ฑ๊ฒฉ์„ ๊ฐ–๊ฒŒ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Conclusion

๋‹จ์ˆœํžˆ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๋งŒ์„ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๊ฒ€์ƒ‰๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ฟผ๋ฆฌ์— ๋”ฐ๋ฅธ ๋‹ต๋ณ€์„ ์ •์ œํ•ด์„œ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.โœŒ๐Ÿป

์ง€๊ธˆ๊นŒ์ง€ ๊ฐ„๋‹จํ•œ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์„ ๋งŒ๋“ค์–ด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์— ํ’ˆ์งˆ ํ‰๊ฐ€, ๋ถ„์„ ๋“ฑ์„ ์ถ”๊ฐ€ํ•œ๋‹ค๋ฉด ์ง„์ •ํ•œ Agentic ๊ฒ€์ƒ‰ ์—”์ง„์„ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Update on GitHub