neural-mesh / Update /Phase2_Benchmark_System.md
hjkim00's picture
Upload TestTime-RLVR-v2 from Full-pipeline-relative_0827 branch
f50dc54 verified

Phase 2: ๋ฒค์น˜๋งˆํฌ ๋ฌธ์ œ ํ’€์ด ์‹œ์Šคํ…œ ์™„๋ฃŒ

โœ… ๊ตฌํ˜„๋œ ์ปดํฌ๋„ŒํŠธ

1. BenchmarkProblemLoader

  • ํŒŒ์ผ: absolute_zero_reasoner/testtime/benchmark_loader.py
  • ๊ธฐ๋Šฅ:
    • HumanEval+, MBPP+ ๋ฌธ์ œ ๋กœ๋”ฉ
    • ํ…Œ์ŠคํŠธ ์ผ€์ด์Šค ์ถ”์ถœ (assert ๋ฌธ ํŒŒ์‹ฑ)
    • ์†”๋ฃจ์…˜ ๊ฒ€์ฆ (๊ตฌ๋ฌธ + ์‹คํ–‰)
    • ๋ฐฐ์น˜ ๋กœ๋”ฉ ๋ฐ ํ†ต๊ณ„ ์ •๋ณด ์ œ๊ณต
  • ๊ธฐ๋ฐ˜: ๊ธฐ์กด load_humaneval_problem ํ•จ์ˆ˜ ํ™•์žฅ

2. InitialSolutionGenerator

  • ํŒŒ์ผ: absolute_zero_reasoner/testtime/solution_generator.py
  • ๊ธฐ๋Šฅ:
    • AZR ์Šคํƒ€์ผ ๋ชจ๋ธ ๋กœ๋”ฉ (flash attention, gradient checkpointing)
    • Greedy ์ƒ์„ฑ (AZR evaluation๊ณผ ๋™์ผ)
    • ํ•จ์ˆ˜ ์ •์˜ ์ž๋™ ๋ณต๊ตฌ
    • ๋Œ€์ฒด ์†”๋ฃจ์…˜ ์ƒ์„ฑ (๋ฌธ์ œ๋ณ„ ํ…œํ”Œ๋ฆฟ)
  • ๊ธฐ๋ฐ˜: ๊ธฐ์กด generate_initial_solution ํ•จ์ˆ˜ ํด๋ž˜์Šคํ™”

3. TestTimeLogger

  • ํŒŒ์ผ: absolute_zero_reasoner/testtime/logger.py
  • ๊ธฐ๋Šฅ:
    • ์š”๊ตฌ์‚ฌํ•ญ 1: ๋ฒค์น˜๋งˆํฌ ๋ฌธ์ œ + LLM ๋‹ต๋ณ€ + ์ •๋‹ต ์—ฌ๋ถ€
    • ์š”๊ตฌ์‚ฌํ•ญ 2: IPO ์ถ”์ถœ + ํƒœ์Šคํฌ ์ƒ์„ฑ ๋กœ๊ทธ
    • ์š”๊ตฌ์‚ฌํ•ญ 3: ํƒœ์Šคํฌ ์ •ํ™•๋„ + reward ๋กœ๊ทธ
    • ์š”๊ตฌ์‚ฌํ•ญ 4: VeRL ํ•™์Šต ์ง„ํ–‰ ๋กœ๊ทธ
    • JSON ํ˜•ํƒœ ๊ตฌ์กฐํ™”๋œ ๋กœ๊ทธ ์ €์žฅ

4. ์„ค์ • ์‹œ์Šคํ…œ

  • ํŒŒ์ผ: absolute_zero_reasoner/testtime/config.py
  • ํด๋ž˜์Šค: TestTimeConfig, BenchmarkConfig
  • ๊ธฐ๋Šฅ: AZR ํ˜ธํ™˜ + TestTime ํŠนํ™” ์„ค์ •

๐Ÿงช ํ…Œ์ŠคํŠธ ๊ฒฐ๊ณผ

๊ธฐ๋ณธ ๊ธฐ๋Šฅ ํ…Œ์ŠคํŠธ (โœ… 3/3 ํ†ต๊ณผ)

Configuration: โœ… PASS
Logger: โœ… PASS  
BenchmarkLoader: โœ… PASS

๊ฒ€์ฆ๋œ ๊ธฐ๋Šฅ

  • โœ… MBPP ๋ฌธ์ œ ๋กœ๋”ฉ (Mbpp/2 ์„ฑ๊ณต)
  • โœ… ๋ฌธ์ œ ํ†ต๊ณ„ (378๊ฐœ ๋ฌธ์ œ ํ™•์ธ)
  • โœ… ๋กœ๊น… ์‹œ์Šคํ…œ (5๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ)
  • โœ… ์„ค์ • ๊ด€๋ฆฌ (AZR ํ˜ธํ™˜)

๐Ÿ“ ์ƒ์„ฑ๋œ ๊ตฌ์กฐ

TestTime-RLVR-v2/absolute_zero_reasoner/testtime/
โ”œโ”€โ”€ __init__.py                # ํŒจํ‚ค์ง€ ์ดˆ๊ธฐํ™”
โ”œโ”€โ”€ config.py                  # ์„ค์ • ํด๋ž˜์Šค
โ”œโ”€โ”€ benchmark_loader.py        # ๋ฒค์น˜๋งˆํฌ ๋กœ๋”
โ”œโ”€โ”€ solution_generator.py      # ์†”๋ฃจ์…˜ ์ƒ์„ฑ๊ธฐ
โ””โ”€โ”€ logger.py                  # ๋กœ๊น… ์‹œ์Šคํ…œ

๐Ÿ—‘๏ธ ์ •๋ฆฌ๋œ ํ•ญ๋ชฉ

  • โœ… Python ์บ์‹œ ํŒŒ์ผ (__pycache__, *.pyc) ์‚ญ์ œ
  • โœ… ๋ถˆํ•„์š”ํ•œ ์ž„ํฌํŠธ ์ •๋ฆฌ (์•„์ง ๊ตฌํ˜„๋˜์ง€ ์•Š์€ ์ปดํฌ๋„ŒํŠธ ์ฃผ์„ ์ฒ˜๋ฆฌ)
  • โœ… ํ…Œ์ŠคํŠธ ํŒŒ์ผ์„ /tmp/azr/์— ์ž„์‹œ ์ €์žฅ

๐ŸŽฏ ๋‹ค์Œ ๋‹จ๊ณ„ (Phase 3)

Phase 3์—์„œ ๊ตฌํ˜„ํ•  IPO Triple ์ถ”์ถœ ์‹œ์Šคํ…œ:

  1. IPOTripleExtractor - AZR Python Executor ๊ธฐ๋ฐ˜ IPO ์ถ”์ถœ
  2. TripleValidator - ์ถ”์ถœ๋œ ํŠธ๋ฆฌํ”Œ ๊ฒ€์ฆ
  3. AZR ์—ฐ๋™ - utils/code_utils/python_executor.py ํ™œ์šฉ

AZR ์ปดํฌ๋„ŒํŠธ ํ™œ์šฉ ๊ณ„ํš

  • absolute_zero_reasoner/utils/code_utils/python_executor.py - ์ฝ”๋“œ ์‹คํ–‰
  • absolute_zero_reasoner/trainer/ppo/azr_ray_trainer.py:641-655 - IPO ์ƒ์„ฑ ๋กœ์ง
  • absolute_zero_reasoner/rewards/reward_managers.py:220-233 - ๊ฒ€์ฆ ๋กœ์ง

์ƒ์„ฑ ์ผ์‹œ: 2025-07-16
์ƒํƒœ: โœ… ์™„๋ฃŒ ํ…Œ์ŠคํŠธ: โœ… ํ†ต๊ณผ (3/3)