BaoLuo-LawAssistant-sftglm-6b 宝锣法律大模型1.0

介绍

宝锣法律大模型是一个基于Encoder-Decoder开源的中文法律对话语言模型，使用开源法律领域的数据进行精调，能够提供法律法规检索、法律咨询、案情分析、罪名预测等服务。基于 General Language Model (GLM) 架构，对chatglm进行了微调，用户可以在消费级的显卡上进行本地部署。本项目不支持商用，可做研究使用。

软件依赖

pip install protobuf==3.20.0 transformers>=4.27.1 icetk cpm_kernels torch==2.0.1

代码调用

可以通过如下代码调用 BaoLuo-LawAssistant-sftglm-6b 模型来生成对话：


>>> from transformers import AutoTokenizer, AutoModel, AutoConfig

>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
>>> config = AutoConfig.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, pre_seq_len=256)
>>> model = AutoModel.from_pretrained("THUDM/chatglm-6b", config=config, trust_remote_code=True).half().cuda()

>>> model = model.quantize(bits=8, kernel_file="xuanxuanzl/BaoLuo-LawAssistant-sftglm-6b/quantization_kernels.so")
>>> prefix_state_dict = torch.load(os.path.join("xuanxuanzl/BaoLuo-LawAssistant-sftglm-6b", "pytorch_model.bin"))
>>> new_prefix_state_dict = {}
>>> for k, v in prefix_state_dict.items():
    >>> if k.startswith("transformer.prefix_encoder."):
        >>> new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v
>>> model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)
>>> model.transformer.prefix_encoder.float()
>>> model = model.eval()

>>> response, history = model.chat(tokenizer, "你好", history=[])
>>> print(response)

协议

本仓库的代码依照 Apache-2.0 协议开源，ChatGLM-6B 模型的权重的使用则需要遵循 Model License。

模型需要完善

基准模型采用的性能不高，导致回复响应时间较长，下一步采用效率更高的基础模型。
各服务功能的数据分布不均衡。
各服务数据的重要指令设计不足。
结合外部知识增强提升模型输出的准确度方面有欠缺。

更新日志

2023年7月10日宝锣法律大模型V1.0发布，宝锣法律AI助理同日发布。