Genius-Society
/

SnakeAI

Model card Files Files and versions

SnakeAI / README.md

kakamond's picture

Update README.md

118391f verified about 1 month ago

|

4.36 kB

	---
	license: mit
	---

	PPO Snake AI Report & weights after training

	## Intro
	This experiment aims to train an artificial intelligence agent to play the Snake game using deep reinforcement learning algorithms(DQN and PPO).The agent(i.e.,the snake)operates within a game environment,with states including the coordinates of the snake's head,the coordinate list of the snake's body,the direction of the snake's head,and the coordinates of the food.The reward mechanism is based on scores for eating food,winning,or losing.The experiment uses the PyGame framework for environment simulation and adjusts reward parameters(such as keeping the reward for eating food constant while gradually increasing the penalty for death)to observe training outcomes.The results show that increasing the penalty for death leads to higher average scores,while a strategy with a lower death penalty performs poorly during training but well in demonstrations.Future work will attempt to optimize the snake's movement by adding penalties for excessive zigzagging and integrating the saved model into a C++framework.

	## Usage
	### Download
	```python
	from modelscope import snapshot_download
	model_dir = snapshot_download('Genius-Society/SnakeAI')
	```

	### Maintenance
	```bash
	git clone git@hf.co:Genius-Society/SnakeAI
	cd SnakeAI
	```

	## Training curve
	\| Round \| 1 \| 2 \| 3 \|
	\| :----------- \| :--------------------------------------------------------------------------------------------------------------: \| :--------------------------------------------------------------------------------------------------------------: \| :--------------------------------------------------------------------------------------------------------------: \|
	\| Traing curve \| ![round1](https://user-images.githubusercontent.com/20459298/233120722-d300c250-a07e-44c1-8986-d1f26d48c0f8.png) \| ![round2](https://user-images.githubusercontent.com/20459298/233120780-43c9b35b-def6-4a57-b7b4-6599ad594c5c.png) \| ![round3](https://user-images.githubusercontent.com/20459298/233120831-deb18303-25ec-4ff8-bafc-4726d1a81af4.png) \|
	\| Evaluation \| ![round1](https://user-images.githubusercontent.com/20459298/233120884-b0ea6080-8aa4-4382-9ce5-90c22737cdf3.gif) \| ![round2](https://user-images.githubusercontent.com/20459298/233121028-f9431608-3833-49d5-9cde-573fdb82c692.gif) \| ![round3](https://user-images.githubusercontent.com/20459298/233121080-9a4f2e95-0f49-40cf-91a4-f7f57d4b861f.gif) \|
	\| Reward_eat \| +2.0 \| +2.0 \| +2.0 \|
	\| Reward_hit \| -0.5 \| -1.0 \| -1.5 \|
	\| Reward_bit \| -0.8 \| -1.5 \| -2.0 \|
	\| Avg record \| ≈19 \| ≈23 \| ≈28 \|

	## Mirror
	<https://www.modelscope.cn/models/Genius-Society/SnakeAI>

	## Reference
	[1] <https://github.com/Genius-Society/SnakeAI>