AGENTS Guide - audio-embeddings
This file is for coding agents working in this repository. Follow these repo-specific rules over generic defaults.
1) Environment Snapshot
- Python:
>=3.12(frompyproject.toml). - Dependency manager:
uv. - Main stack: PyTorch, PyTorch Lightning, Hydra, OmegaConf.
- Project root marker:
.project-root. - Main entrypoint:
src/train.py.
2) Cursor / Copilot Rule Files
- Checked
.cursor/rules/: not present. - Checked
.cursorrules: not present. - Checked
.github/copilot-instructions.md: not present. - Therefore, no additional Cursor/Copilot rule files are currently enforced.
3) Install / Setup Commands
uv sync
uv run <command>
uv add <package>
4) Build / Train / Eval Commands
There is no separate "build" step (this is a training codebase). Use quick-run training as the integration sanity check.
uv run src/train.py
uv run src/train.py trainer.fast_dev_run=True
uv run src/train.py trainer=cpu trainer.fast_dev_run=True
uv run src/train.py experiment=local/audio_jepa
uv run src/train.py trainer.max_epochs=10 data.batch_size=32 model.optimizer.lr=1e-4
Cluster-style execution (existing project pattern):
srun .venv/bin/python -u -O src/train.py experiment=cluster_jepa_audioset_rope +trainer.max_time="00:19:50:00"
5) Lint / Formatting / Static Checks
Use the commands below as pragmatic checks:
uv run pre-commit run --all-files
uv run pre-commit run ruff --all-files
uv run pre-commit run ruff-format --all-files
uv run python -m compileall src
Ruff is configured via .pre-commit-config.yaml and runs both lint fixes and formatting.
6) Test Commands (Including Single Test)
Primary validation in this repo is script-based verification under tests/.
Run test files directly as native Python files:
uv run tests/verify_rope.py
uv run tests/verify_custom_rope.py
uv run tests/verify_data.py
Useful single-file checks (native execution):
uv run src/train.py trainer.fast_dev_run=True
uv run src/train.py trainer=cpu trainer.fast_dev_run=True
uv run scripts/verify_shapes.py
uv run scripts/verify_scheduler.py
Notes:
tests/test_*.pyare pytest-style and are not part of the default native-file workflow.- Prefer
tests/verify_*.pyandscripts/verify_*.pyfor lightweight checks.
7) Repository Architecture Expectations
configs/: Hydra composition (trainer/data/model/logger/callbacks/experiment).src/train.py: orchestration only (instantiate and run).src/models/: LightningModules (high-level training logic).src/models/components/: reusablenn.Modulebuilding blocks.src/data/: DataModules/Datasets and collate logic.src/utils/: logging, instantiation, wrappers, scheduler helpers. When possible, prefer config changes over hardcoded Python changes.
8) Code Style Guidelines
Imports
- Group imports as: standard library -> third-party -> local
src.*. - Keep one import per line unless importing multiple names from same module.
- Avoid wildcard imports.
- Prefer absolute imports from
src....
Formatting
- Use 4-space indentation and readable line lengths.
- Keep functions small; extract helpers for complex logic.
- Do not introduce unrelated reformatting in touched files.
- Keep comments for non-obvious intent, not obvious mechanics.
Typing
- Type hints are expected for function arguments and return values.
- Use concrete tensor/container types when practical.
- Use
Optional[T]/T | Noneconsistently within a file. - For dict-like configs, type as
DictConfigwhen passing Hydra config objects.
Naming
snake_case: functions, variables, module filenames.PascalCase: classes (AudioJEPAModule,AudioSetDataModule).UPPER_SNAKE_CASE: constants.- Prefer descriptive names (
mask_indices) over short names (m2) except local math temporaries.
PyTorch / Lightning / Hydra Conventions
- Keep heavy compute out of
__init__where possible. forward()for inference logic; training behavior intraining_step().- Use
self.log(...)with explicit flags (on_step,on_epoch,prog_bar,batch_size). - Instantiate components through Hydra (
hydra.utils.instantiate). - Expose tunable parameters in config files, not hardcoded literals.
Error Handling and Validation
- Raise informative
ValueError/RuntimeErrorfor invalid config/state. - Validate critical tensor assumptions with assertions or explicit checks.
- Prefer logger/warnings over bare
print()in new code. - For file I/O, prefer
pathlib.Pathand existence checks.
Data and Paths
- Do not hardcode absolute machine paths.
- Use
rootutils.setup_root(..., indicator=".project-root", pythonpath=True)in entrypoints/scripts when needed. - Respect
cfg.paths.*outputs for logs/checkpoints/artifacts.
9) Agent Workflow Rules
- Reuse existing components before adding new abstractions.
- Keep
src/train.pygeneric; place model/data logic in dedicated modules. - Prefer minimal, focused diffs.
- Update configs and docs when behavior changes.
- Validate with the smallest meaningful command first (
fast_dev_run, single test), then broader checks.
10) Git / Change Hygiene
- Do not revert unrelated local changes.
- Keep commits scoped to one concern.
- Write clear commit messages describing intent.
- Prefer Conventional Commit-like format:
type(scope): intent. - Common types in this repo:
feat,fix,conf,build,docs,style,chore. - Never commit secrets, credentials, or environment-specific absolute paths.
11) Practical Agent Defaults
- Prefer reusing existing modules over creating new abstractions.
- Keep edits local to the requested change; avoid drive-by refactors.
- Run the smallest useful verification command after changes.
- If you touch training logic, run at least one fast training sanity check.
- If you touch model components, run relevant verify script(s) in
tests/. - If you touch Hydra config wiring, run a config-backed entry command via
uv run src/train.py ....
12) Common Pitfalls
- Avoid hardcoding data paths; use config (
cfg.paths, data config fields). - Avoid printing in new code paths; use ranked loggers/warnings.
- Avoid putting heavy tensor compute in constructors.
- Avoid bypassing Hydra by manually instantiating configurable components.
- Avoid changing unrelated formatting in files you touch.