CodeS is a series of Code LLMs specifically optimized for SQL generation.

The CodeS encompasses 1B, 3B, 7B, and 15B scales. CodeS-1B, 3B, and 7B are incrementally pre-trained on the top of StarCoderBase-1B, 3B, and 7B and support the max length of 8,192. Meanwhile, CodeS-15B, derived from StarCoder-15B, accommodates sequences of up to 6,144 tokens.

We have demonstrated that CodeS achieves new state-of-the-art performance on two challenging Text-to-SQL benchmarks: Spider and Bird.

For more details about how to use CodeS, please refer to our GitHub page: https://github.com/RUCKBReasoning/codes.

(This is the repository of CodeS-15B.)

