Model Card for Model ID

Expert-pruned GLM-4.7-Flash for Japanese→English subtitle translation. Routed-expert usage was measured on subtitle data and the least-used experts dropped (64 → 32 per layer). ~16B parameters.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: [More Information Needed]
Funded by [optional]: [More Information Needed]
Shared by [optional]: [More Information Needed]
Model type: [More Information Needed]
Language(s) (NLP): Japanese, English
License: MIT
Finetuned from model [optional]: GLM-4.7-Flash

Uses

Direct Use

[More Information Needed]

Downstream Use [optional]

[More Information Needed]

Out-of-Scope Use

[More Information Needed]

Bias, Risks, and Limitations

[More Information Needed]

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

[More Information Needed]

Training Procedure

Preprocessing [optional]

[More Information Needed]

Training Hyperparameters

Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[More Information Needed]

Evaluation

Testing Data, Factors & Metrics

Testing Data

[More Information Needed]

Factors

[More Information Needed]

Metrics

[More Information Needed]

Results

[More Information Needed]

Summary

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation / Data Attribution

Partial calibration data: JESC (Japanese-English Subtitle Corpus), licensed under CC BY 4.0. Source: https://nlp.stanford.edu/projects/jesc/

@ARTICLE{pryzant_jesc_2018, author = {{Pryzant}, R. and {Chung}, Y. and {Jurafsky}, D. and {Britz}, D.}, title = "{JESC: Japanese-English Subtitle Corpus}", journal = {Language Resources and Evaluation Conference (LREC)}, keywords = {Computer Science - Computation and Language}, year = 2018 }

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

TBA

Downloads last month: 15

Safetensors

Model size

16B params

Tensor type

F32

BF16

Model tree for klhashim/glm-4.7-flash-JP-EN-prune

Base model

zai-org/GLM-4.7-Flash

Finetuned

(67)

this model