philschmid HF staff commited on
Commit
4703356
1 Parent(s): e380c4b

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ L4/engines/rank0.engine filter=lfs diff=lfs merge=lfs -text
L4/LICENSE ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ LLAMA 3.1 COMMUNITY LICENSE AGREEMENT
2
+ Llama 3.1 Version Release Date: July 23, 2024
3
+
4
+ “Agreement” means the terms and conditions for use, reproduction, distribution and modification of the
5
+ Llama Materials set forth herein.
6
+
7
+ “Documentation” means the specifications, manuals and documentation accompanying Llama 3.1
8
+ distributed by Meta at https://llama.meta.com/doc/overview.
9
+
10
+ “Licensee” or “you” means you, or your employer or any other person or entity (if you are entering into
11
+ this Agreement on such person or entity’s behalf), of the age required under applicable laws, rules or
12
+ regulations to provide legal consent and that has legal authority to bind your employer or such other
13
+ person or entity if you are entering in this Agreement on their behalf.
14
+
15
+ “Llama 3.1” means the foundational large language models and software and algorithms, including
16
+ machine-learning model code, trained model weights, inference-enabling code, training-enabling code,
17
+ fine-tuning enabling code and other elements of the foregoing distributed by Meta at
18
+ https://llama.meta.com/llama-downloads.
19
+
20
+ “Llama Materials” means, collectively, Meta’s proprietary Llama 3.1 and Documentation (and any
21
+ portion thereof) made available under this Agreement.
22
+
23
+ “Meta” or “we” means Meta Platforms Ireland Limited (if you are located in or, if you are an entity, your
24
+ principal place of business is in the EEA or Switzerland) and Meta Platforms, Inc. (if you are located
25
+ outside of the EEA or Switzerland).
26
+
27
+ By clicking “I Accept” below or by using or distributing any portion or element of the Llama Materials,
28
+ you agree to be bound by this Agreement.
29
+
30
+ 1. License Rights and Redistribution.
31
+
32
+ a. Grant of Rights. You are granted a non-exclusive, worldwide, non-transferable and royalty-free
33
+ limited license under Meta’s intellectual property or other rights owned by Meta embodied in the Llama
34
+ Materials to use, reproduce, distribute, copy, create derivative works of, and make modifications to the
35
+ Llama Materials.
36
+
37
+ b. Redistribution and Use.
38
+
39
+ i. If you distribute or make available the Llama Materials (or any derivative works
40
+ thereof), or a product or service (including another AI model) that contains any of them, you shall (A)
41
+ provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with
42
+ Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use
43
+ the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or
44
+ otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at
45
+ the beginning of any such AI model name.
46
+
47
+ ii. If you receive Llama Materials, or any derivative works thereof, from a Licensee as part
48
+ of an integrated end user product, then Section 2 of this Agreement will not apply to you.
49
+
50
+ iii. You must retain in all copies of the Llama Materials that you distribute the following
51
+ attribution notice within a “Notice” text file distributed as a part of such copies: “Llama 3.1 is
52
+ licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights
53
+ Reserved.”
54
+
55
+ iv. Your use of the Llama Materials must comply with applicable laws and regulations
56
+ (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Llama
57
+ Materials (available at https://llama.meta.com/llama3_1/use-policy), which is hereby incorporated by
58
+ reference into this Agreement.
59
+
60
+ 2. Additional Commercial Terms. If, on the Llama 3.1 version release date, the monthly active users
61
+ of the products or services made available by or for Licensee, or Licensee’s affiliates, is greater than 700
62
+ million monthly active users in the preceding calendar month, you must request a license from Meta,
63
+ which Meta may grant to you in its sole discretion, and you are not authorized to exercise any of the
64
+ rights under this Agreement unless or until Meta otherwise expressly grants you such rights.
65
+
66
+ 3. Disclaimer of Warranty. UNLESS REQUIRED BY APPLICABLE LAW, THE LLAMA MATERIALS AND ANY
67
+ OUTPUT AND RESULTS THEREFROM ARE PROVIDED ON AN “AS IS” BASIS, WITHOUT WARRANTIES OF
68
+ ANY KIND, AND META DISCLAIMS ALL WARRANTIES OF ANY KIND, BOTH EXPRESS AND IMPLIED,
69
+ INCLUDING, WITHOUT LIMITATION, ANY WARRANTIES OF TITLE, NON-INFRINGEMENT,
70
+ MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR
71
+ DETERMINING THE APPROPRIATENESS OF USING OR REDISTRIBUTING THE LLAMA MATERIALS AND
72
+ ASSUME ANY RISKS ASSOCIATED WITH YOUR USE OF THE LLAMA MATERIALS AND ANY OUTPUT AND
73
+ RESULTS.
74
+
75
+ 4. Limitation of Liability. IN NO EVENT WILL META OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF
76
+ LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, ARISING
77
+ OUT OF THIS AGREEMENT, FOR ANY LOST PROFITS OR ANY INDIRECT, SPECIAL, CONSEQUENTIAL,
78
+ INCIDENTAL, EXEMPLARY OR PUNITIVE DAMAGES, EVEN IF META OR ITS AFFILIATES HAVE BEEN ADVISED
79
+ OF THE POSSIBILITY OF ANY OF THE FOREGOING.
80
+
81
+ 5. Intellectual Property.
82
+
83
+ a. No trademark licenses are granted under this Agreement, and in connection with the Llama
84
+ Materials, neither Meta nor Licensee may use any name or mark owned by or associated with the other
85
+ or any of its affiliates, except as required for reasonable and customary use in describing and
86
+ redistributing the Llama Materials or as set forth in this Section 5(a). Meta hereby grants you a license to
87
+ use “Llama” (the “Mark”) solely as required to comply with the last sentence of Section 1.b.i. You will
88
+ comply with Meta’s brand guidelines (currently accessible at
89
+ https://about.meta.com/brand/resources/meta/company-brand/ ). All goodwill arising out of your use
90
+ of the Mark will inure to the benefit of Meta.
91
+
92
+ b. Subject to Meta’s ownership of Llama Materials and derivatives made by or for Meta, with
93
+ respect to any derivative works and modifications of the Llama Materials that are made by you, as
94
+ between you and Meta, you are and will be the owner of such derivative works and modifications.
95
+
96
+ c. If you institute litigation or other proceedings against Meta or any entity (including a
97
+ cross-claim or counterclaim in a lawsuit) alleging that the Llama Materials or Llama 3.1 outputs or
98
+ results, or any portion of any of the foregoing, constitutes infringement of intellectual property or other
99
+ rights owned or licensable by you, then any licenses granted to you under this Agreement shall
100
+ terminate as of the date such litigation or claim is filed or instituted. You will indemnify and hold
101
+ harmless Meta from and against any claim by any third party arising out of or related to your use or
102
+ distribution of the Llama Materials.
103
+
104
+ 6. Term and Termination. The term of this Agreement will commence upon your acceptance of this
105
+ Agreement or access to the Llama Materials and will continue in full force and effect until terminated in
106
+ accordance with the terms and conditions herein. Meta may terminate this Agreement if you are in
107
+ breach of any term or condition of this Agreement. Upon termination of this Agreement, you shall delete
108
+ and cease use of the Llama Materials. Sections 3, 4 and 7 shall survive the termination of this
109
+ Agreement.
110
+
111
+ 7. Governing Law and Jurisdiction. This Agreement will be governed and construed under the laws of
112
+ the State of California without regard to choice of law principles, and the UN Convention on Contracts
113
+ for the International Sale of Goods does not apply to this Agreement. The courts of California shall have
114
+ exclusive jurisdiction of any dispute arising out of this Agreement.
L4/engines/config.json ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "0.13.0.dev2024090300",
3
+ "pretrained_config": {
4
+ "mlp_bias": false,
5
+ "attn_bias": false,
6
+ "rotary_base": 500000.0,
7
+ "rotary_scaling": {
8
+ "factor": 8.0,
9
+ "low_freq_factor": 1.0,
10
+ "high_freq_factor": 4.0,
11
+ "original_max_position_embeddings": 8192,
12
+ "rope_type": "llama3"
13
+ },
14
+ "residual_mlp": false,
15
+ "disable_weight_only_quant_plugin": false,
16
+ "moe": {
17
+ "num_experts": 0,
18
+ "top_k": 0,
19
+ "normalization_mode": null,
20
+ "sparse_mixer_epsilon": 0.01,
21
+ "tp_mode": 0
22
+ },
23
+ "remove_duplicated_kv_heads": false,
24
+ "architecture": "LlamaForCausalLM",
25
+ "dtype": "bfloat16",
26
+ "vocab_size": 128256,
27
+ "hidden_size": 4096,
28
+ "num_hidden_layers": 32,
29
+ "num_attention_heads": 32,
30
+ "hidden_act": "silu",
31
+ "logits_dtype": "float32",
32
+ "norm_epsilon": 1e-05,
33
+ "position_embedding_type": "rope_gpt_neox",
34
+ "max_position_embeddings": 131072,
35
+ "num_key_value_heads": 8,
36
+ "intermediate_size": 14336,
37
+ "mapping": {
38
+ "world_size": 1,
39
+ "gpus_per_node": 8,
40
+ "cp_size": 1,
41
+ "tp_size": 1,
42
+ "pp_size": 1,
43
+ "moe_tp_size": 1,
44
+ "moe_ep_size": 1
45
+ },
46
+ "quantization": {
47
+ "quant_algo": null,
48
+ "kv_cache_quant_algo": null,
49
+ "group_size": 128,
50
+ "smoothquant_val": 0.5,
51
+ "clamp_val": null,
52
+ "has_zero_point": false,
53
+ "pre_quant_scale": false,
54
+ "exclude_modules": null
55
+ },
56
+ "use_parallel_embedding": false,
57
+ "embedding_sharding_dim": 0,
58
+ "share_embedding_table": false,
59
+ "head_size": 128,
60
+ "qk_layernorm": false
61
+ },
62
+ "build_config": {
63
+ "max_input_len": 131072,
64
+ "max_seq_len": 131072,
65
+ "opt_batch_size": 8,
66
+ "max_batch_size": 32,
67
+ "max_beam_width": 1,
68
+ "max_num_tokens": 262144,
69
+ "opt_num_tokens": 32,
70
+ "max_prompt_embedding_table_size": 0,
71
+ "kv_cache_type": "PAGED",
72
+ "gather_context_logits": false,
73
+ "gather_generation_logits": false,
74
+ "strongly_typed": true,
75
+ "builder_opt": 3,
76
+ "force_num_profiles": null,
77
+ "profiling_verbosity": "layer_names_only",
78
+ "enable_debug_output": false,
79
+ "max_draft_len": 0,
80
+ "speculative_decoding_mode": 1,
81
+ "use_refit": false,
82
+ "input_timing_cache": null,
83
+ "output_timing_cache": null,
84
+ "lora_config": {
85
+ "lora_dir": [],
86
+ "lora_ckpt_source": "hf",
87
+ "max_lora_rank": 64,
88
+ "lora_target_modules": [],
89
+ "trtllm_modules_to_hf_modules": {}
90
+ },
91
+ "auto_parallel_config": {
92
+ "world_size": 1,
93
+ "gpus_per_node": 8,
94
+ "cluster_key": null,
95
+ "cluster_info": null,
96
+ "sharding_cost_model": "alpha_beta",
97
+ "comm_cost_model": "alpha_beta",
98
+ "enable_pipeline_parallelism": false,
99
+ "enable_shard_unbalanced_shape": false,
100
+ "enable_shard_dynamic_shape": false,
101
+ "enable_reduce_scatter": true,
102
+ "builder_flags": null,
103
+ "debug_mode": false,
104
+ "infer_shape": true,
105
+ "validation_mode": false,
106
+ "same_buffer_io": {},
107
+ "same_spec_io": {},
108
+ "sharded_io_allowlist": [],
109
+ "fill_weights": false,
110
+ "parallel_config_cache": null,
111
+ "profile_cache": null,
112
+ "dump_path": null,
113
+ "debug_outputs": []
114
+ },
115
+ "weight_sparsity": false,
116
+ "weight_streaming": false,
117
+ "plugin_config": {
118
+ "dtype": "bfloat16",
119
+ "bert_attention_plugin": "auto",
120
+ "gpt_attention_plugin": "auto",
121
+ "gemm_plugin": "auto",
122
+ "gemm_swiglu_plugin": null,
123
+ "fp8_rowwise_gemm_plugin": null,
124
+ "smooth_quant_gemm_plugin": null,
125
+ "identity_plugin": null,
126
+ "layernorm_quantization_plugin": null,
127
+ "rmsnorm_quantization_plugin": null,
128
+ "nccl_plugin": null,
129
+ "lookup_plugin": null,
130
+ "lora_plugin": null,
131
+ "weight_only_groupwise_quant_matmul_plugin": null,
132
+ "weight_only_quant_matmul_plugin": null,
133
+ "quantize_per_token_plugin": false,
134
+ "quantize_tensor_plugin": false,
135
+ "moe_plugin": "auto",
136
+ "mamba_conv1d_plugin": "auto",
137
+ "low_latency_gemm_plugin": null,
138
+ "context_fmha": true,
139
+ "bert_context_fmha_fp32_acc": false,
140
+ "paged_kv_cache": true,
141
+ "remove_input_padding": true,
142
+ "reduce_fusion": false,
143
+ "enable_xqa": false,
144
+ "tokens_per_block": 64,
145
+ "use_paged_context_fmha": true,
146
+ "use_fp8_context_fmha": false,
147
+ "multiple_profiles": false,
148
+ "paged_state": false,
149
+ "streamingllm": false,
150
+ "manage_weights": false,
151
+ "use_fused_mlp": true
152
+ },
153
+ "use_strip_plan": false,
154
+ "max_encoder_input_len": 1,
155
+ "use_fused_mlp": true
156
+ }
157
+ }
L4/engines/generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 128000,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 128001,
6
+ 128008,
7
+ 128009
8
+ ],
9
+ "temperature": 0.6,
10
+ "top_p": 0.9,
11
+ "transformers_version": "4.46.1"
12
+ }
L4/engines/rank0.engine ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4c4671e9b9c7a4a0fb3839bb733b277b5609482ee4c7dfbfcf24bda10938897
3
+ size 16132622740
L4/generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 128000,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 128001,
6
+ 128008,
7
+ 128009
8
+ ],
9
+ "temperature": 0.6,
10
+ "top_p": 0.9,
11
+ "transformers_version": "4.46.1"
12
+ }
README.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: trtllm
5
+ tags:
6
+ - model_hub_mixin
7
+ - optimum-nvidia
8
+ - trtllm
9
+ ---
10
+
11
+ This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
12
+ - Library: https://github.com/huggingface/optimum-nvidia
13
+ - Docs: https://huggingface.co/docs/optimum/nvidia_overview
14
+
15
+
16
+
17
+ huggingface-cli upload my-cool-model . .
config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "executor_config": null,
3
+ "load_engines": false
4
+ }