PODsys / README.md
PODsys-ai
update podsys-v2.6
230a830
metadata
license: apache-2.0

Overview

PODsys focuses on AI cluster deployment scenarios, providing a complete toolchain including infrastructure environment installation, environment deployment, user management, system monitoring and resource scheduling, aiming to create an open-source, efficient, compatible and easy-to-use intelligent cluster system environment deployment solution.

To achieve these capabilities, PODsys integrates dozens of drivers, softwares, and other installation packages required for AI cluster deployment, and provides a range of scripting tools to simplify deployment. Using these tools, users can complete the deployment of the entire cluster with several simple commands.

  • Environment deployment and management: PODsys provides quick tools for environment deployment and management, including quick installation, configuration, and updating of cluster environments. It also includes the operating system, NVIDIA drivers, InfiniBand drivers and other necessary software base packages, to provide users with a complete GPU cluster environment. Users can manage cluster nodes, add or remove nodes, and monitor node status and performance with simple commands.

  • User management and permission control: PODsys has a comprehensive user management and permission control mechanism. Administrators can create and manage user accounts and assign different permissions and resource quotas. This allows each user or team to flexibly allocate resources in the cluster and ensures the security of the cluster.

  • System monitoring and performance optimization: PODsys provides comprehensive system monitoring and performance optimization capabilities to help users monitor the status and performance indicators of the cluster in real time. Through a visual interface, users can view cluster resource usage, job execution, and performance bottlenecks to adjust cluster configurations and optimize job performance in a timely manner.

  • Resource scheduling and job management: PODsys provides efficient resource scheduling and job management functions, which can automatically schedule and manage jobs according to users' needs to ensure the resource utilization of the cluster and the execution efficiency of jobs.

User Guide

Please visit the official website at https://podsys.ai/