Transformers
GGUF
English
mistral
text-generation-inference

good parameters to run the model

#2
by dzyla - opened

I've been playing with the model in LM studio but was unable to find the settings that would reliably generate output. If someone could suggest some I would appreciate it.

So far I tried:
n_batch: 512
n_ctx: 64000/128000
rope_freq_scale: 1/0.25
rope_freq_base: 10000

Thanks!

Same, I just get junk no matter what I input. Tried a bunch now

Don't need anything special.

./main -t 16 -m models/yarn-mistral-7b-128k.Q8_0.gguf -c 65536 -n 64 --top-k 1 -p 'AI is going to'
[... omit ...]
llm_load_print_meta: rope scaling     = yarn
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 0.0625
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = yes
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = mostly Q8_0
llm_load_print_meta: model params     = 7.24 B
llm_load_print_meta: model size       = 7.17 GiB (8.50 BPW) 
llm_load_print_meta: general.name   = nousresearch_yarn-mistral-7b-128k
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.11 MB
llm_load_tensors: mem required  = 7338.75 MB
...................................................................................................
llama_new_context_with_model: n_ctx      = 65536
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 0.0625
llama_new_context_with_model: kv self size  = 8192.00 MB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 4254.63 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
sampling: 
        repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 1, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 65536, n_batch = 512, n_predict = 64, n_keep = 0


AI is going to be a big part of the future. It’s already here, but it’s not yet fully realized. The technology has been around for decades and is now being used in many different ways.

The most common use of AI is in the form of chatbots. These are computer programs that can sim

@mljxy Here is the problem, you just predicted 64 tokens, which is nothing, try with -n -1, you will see the model output being garbage.
Check here :

./main -t 16 -m ~/models/yarn-mistral-7b-128k.Q8_0.gguf -c 65536 -n -1 --top-k 1 -p 'AI is going to' -ngl 128
Log start
main: build = 1453 (9a3b4f6)
main: built with cc (Debian 10.2.1-6) 10.2.1 20210110 for x86_64-linux-gnu
main: seed  = 1700099052
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA L4, compute capability 8.9
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /home/ashutosh/models/yarn-mistral-7b-128k.Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q8_0     [  4096, 32000,     1,     1 ]
llama_model_loader: - tensor    1:              blk.0.attn_q.weight q8_0     [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    2:              blk.0.attn_k.weight q8_0     [  4096,  1024,     1,     1 ]
llama_model_loader: - tensor    3:              blk.0.attn_v.weight q8_0     [  4096,  1024,     1,     1 ]
.
.
.
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q8_0:  226 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 131072
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = mostly Q8_0
llm_load_print_meta: model params     = 7.24 B
llm_load_print_meta: model size       = 7.17 GiB (8.50 BPW) 
llm_load_print_meta: general.name   = nousresearch_yarn-mistral-7b-128k
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.10 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  =  132.91 MB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 35/35 layers to GPU
llm_load_tensors: VRAM used: 7205.83 MB
...................................................................................................
llama_new_context_with_model: n_ctx      = 65536
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 8192.00 MB
llama_new_context_with_model: kv self size  = 8192.00 MB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 4254.13 MB
llama_new_context_with_model: VRAM scratch buffer: 4248.00 MB
llama_new_context_with_model: total VRAM used: 19645.83 MB (model: 7205.83 MB, context: 12440.00 MB)

system_info: n_threads = 16 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 0 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
sampling: 
        repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 1, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 65536, n_batch = 512, n_predict = -1, n_keep = 0


AI is going to be a big part of the future. It’s already here, and it’s only going to get more prevalent as time goes on.

The question is: how do you make sure that your business is ready for this new era?

In this blog post, we will discuss some tips on how you can prepare yourself and your company for the future of AI.

## What is Artificial Intelligence (AI)?

Artificial intelligence (AI) is a branch of computer science that deals with the creation of intelligent machines that work and react like humans. The term “artificial intelligence” was coined by John McCarthy in 1956.

AI has been around for decades, but it’s only recently that we’ve seen a surge in interest and investment in this field. This is due to the fact that AI is now more accessible than ever before. With the advent of cloud computing and big data, companies can now easily access the resources they need to build and train AI models.

There are many different types of AI, but some of the most common include:

- Machine learning: This is a type of AI that allows machines to learn from data without being explicitly programmed.
- Natural language processing (NLP): This is a branch of AI that deals with the interaction between computers and human languages.
- Computer vision: This is a field of AI that deals with the ability of computers to interpret and understand images.
- Robotics: This is a field of AI that deals with the design, construction, and operation of robots.

## How can you prepare your business for the future of AI?

There are many different ways that businesses can prepare for the future of AI. Here are some tips:

### 1. Start by understanding what AI is and how it works.

This will help you to better understand the potential benefits and risks associated with this technology.

### 2. Assess your current business processes and identify areas where AI could be used to improve efficiency or productivity.

For example, if you have a lot of data that needs to be analyzed, AI can help you to do this more quickly and accurately than humans.

### 3. Invest in the right tools and resources.

This includes things like cloud computing, big data, and machine learning platforms. These will give you the ability to build and train your own AI models.

### 4. Develop a strategy for how you will use AI in your business.

This should include things like what types of tasks you want to automate, how you will collect and manage data, and how you will ensure that your AI systems are safe and secure.

### 5. Train your employees on the basics of AI so they can understand how it works and how it can be used in their jobs.

This will help them to better adapt to changes in the workplace as AI becomes more prevalent.

### 6. Stay up-to-date with the latest developments in AI.

This includes things like new technologies, research findings, and industry trends. This will help you to stay ahead of the curve and make sure that your business is always prepared for what’s coming next.

## What are some of the benefits of using AI in business?

There are many different benefits that businesses can enjoy by using AI. Some of these include:

- Increased efficiency and productivity: AI can help businesses to automate tasks, which can free up employees to focus on more important things.
- Improved decision-making: AI can help businesses to make better decisions by providing them with data-driven insights.
- Enhanced customer experience: AI can help businesses to provide a more personalized and engaging customer experience.
- Increased competitiveness: By using AI, businesses can gain a competitive edge over their rivals.

## What are some of the risks associated with using AI in business?

There are also some risks that businesses need to be aware of when it comes to using AI. Some of these include:

- Security and privacy concerns: AI systems can be vulnerable to hacking and other security threats. This is why it’s important to ensure that your AI systems are properly secured.
- Bias in data: If the data used to train an AI system is biased, then the resulting model will also be biased. This can lead to unfair or discriminatory decisions being made by the AI system.
- Loss of control: Once an AI system is deployed, it can be difficult to control what it does. This is why it’s important to have a clear understanding of how your AI systems work and what they are capable of.

## How can you ensure that your AI systems are safe and secure?

There are a few things that businesses can do to ensure that their AI systems are safe and secure:

- Use encryption: This will help to protect the data used to train your AI models from being accessed by unauthorized parties.
- Implement access controls: This will help to limit who has access to your AI systems and what they can do with them.
- Conduct regular security audits: This will help you to identify any potential vulnerabilities in your AI systems before they can be exploited by hackers.
- Stay up-to-date with the latest security threats: This will help you to stay ahead of the curve and make sure that your AI systems are always protected from the latest attacks.

## What are some of the most common applications of AI in business?

There are many different ways that businesses can use AI. Some of the most common applications include:

- Customer service: AI can be used to provide customers with personalized and engaging customer service experiences.
- Marketing: AI can help businesses to target their marketing efforts more effectively by providing them with data-driven insights into what their customers want and need.
- Sales: AI can help sales teams to close deals faster by providing them with real-time insights into the progress of each deal.
- Human resources: AI can be used to automate tasks such as recruiting, onboarding, and performance management.
- Finance: AI can help businesses to improve their financial operations by providing them with data-driven insights into things like cash flow, budgeting, and forecasting.

## What are some of the most promising areas for future research in AI?

There are many different areas that researchers are currently exploring when it comes to AI. Some of the most promising include:

- Natural language processing (NLP): This is a field of AI that deals with the interaction between computers and human languages. NLP can be used to improve things like customer service, marketing, and sales.
- Computer vision: This is a field of AI that deals with the ability of computers to interpret and understand images. Computer vision can be used to improve things like security, surveillance, and autonomous vehicles.
- Robotics: This is a field of AI that deals with the design, construction, and operation of robots. Robotics can be used to improve things like manufacturing, logistics, and healthcare.
- Machine learning: This is a type of AI that allows machines to learn from data without being explicitly programmed. Machine learning can be used to improve things like customer service, marketing, and sales.

## What are some of the biggest challenges facing businesses when it comes to using AI?

There are many different challenges that businesses need to overcome when it comes to using AI. Some of the most common include:

- Lack of data: In order for an AI system to be effective, it needs to have access to a large amount of high-quality data. This can be difficult to obtain, especially if you’re working in a niche industry.
- Limited computing power: AI systems require a lot of computing power in order to function properly. This can be expensive, especially if you’re running your own servers.
- Lack of expertise: In order to get the most out of an AI system, you need to have a team of experts who understand how it works and what it’s capable of. This can be difficult to find, especially if you’re working in a small or medium-sized business.
- Security concerns: As we mentioned earlier, AI systems can be vulnerable to hacking and other security threats. This is why it’s important to ensure that your AI systems are properly secured.

## How can businesses overcome these challenges?

There are many different ways that businesses can overcome the challenges associated with using AI. Some of the most common include:

- Partnering with other companies: By partnering with other companies, you can pool your resources and expertise to create a more effective AI system.
- Outsourcing: If you don’t have the in-house expertise to build and maintain an AI system, you can always outsource this work to a third-party provider.
- Using cloud computing: Cloud computing can help businesses to overcome the challenges associated with limited computing power and security concerns.
- Investing in research and development: By investing in R&D, businesses can stay ahead of the curve and ensure that their AI systems are always up-to-date.

## What is the future of AI?

The future of AI is very bright. As we mentioned earlier, there are many different ways that businesses can use AI to improve their operations. In the coming years, we expect to see even more innovative applications of this technology.

If you’re interested in learning more about AI and how it can benefit your business, be sure to check out our blog post on the topic. We’ll be covering everything from the basics of AI to some of the most promising areas for future research. Thanks for reading! [end of text]

llama_print_timings:        load time =    8573.26 ms
llama_print_timings:      sample time =    1123.22 ms /  2063 runs   (    0.54 ms per token,  1836.68 tokens per second)
llama_print_timings: prompt eval time =      54.43 ms /     5 tokens (   10.89 ms per token,    91.86 tokens per second)
llama_print_timings:        eval time =   78847.17 ms /  2062 runs   (   38.24 ms per token,    26.15 tokens per second)
llama_print_timings:       total time =   80786.01 ms
Log end

Don't need anything special.

./main -t 16 -m models/yarn-mistral-7b-128k.Q8_0.gguf -c 65536 -n 64 --top-k 1 -p 'AI is going to'
[... omit ...]
llm_load_print_meta: rope scaling     = yarn
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 0.0625
llm_load_print_meta: n_yarn_orig_ctx  = 8192
llm_load_print_meta: rope_finetuned   = yes
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = mostly Q8_0
llm_load_print_meta: model params     = 7.24 B
llm_load_print_meta: model size       = 7.17 GiB (8.50 BPW) 
llm_load_print_meta: general.name   = nousresearch_yarn-mistral-7b-128k
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.11 MB
llm_load_tensors: mem required  = 7338.75 MB
...................................................................................................
llama_new_context_with_model: n_ctx      = 65536
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 0.0625
llama_new_context_with_model: kv self size  = 8192.00 MB
llama_build_graph: non-view tensors processed: 740/740
llama_new_context_with_model: compute buffer total size = 4254.63 MB

system_info: n_threads = 16 / 32 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
sampling: 
        repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 1, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 65536, n_batch = 512, n_predict = 64, n_keep = 0


AI is going to be a big part of the future. It’s already here, but it’s not yet fully realized. The technology has been around for decades and is now being used in many different ways.

The most common use of AI is in the form of chatbots. These are computer programs that can sim

Why are you using -c 65536 instead of -c 131072 for 128K, since it's a 128k context model?

Sign up or log in to comment