TheBloke commited on
Commit
463601b
1 Parent(s): 37ba5c3

Upload new k-quant GGML quantised models.

Browse files
Files changed (1) hide show
  1. README.md +102 -59
README.md CHANGED
@@ -1,8 +1,6 @@
1
  ---
2
  inference: false
3
  license: other
4
- datasets:
5
- - jondurbin/airoboros-gpt4
6
  ---
7
 
8
  <!-- header start -->
@@ -33,44 +31,54 @@ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/gger
33
  ## Repositories available
34
 
35
  * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/airoboros-13b-gpt4-GPTQ)
36
- * [4-bit, 5-bit, and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/airoboros-13b-gpt4-GGML)
37
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/airoboros-13b-gpt4-fp16)
38
 
39
- ## Prompt template
 
40
 
41
- Uses the Vicuna 1.1 format:
42
 
43
- ```
44
- USER: prompt
45
- ASSISTANT:
46
- ```
47
-
48
- ## Context length with GGML
49
-
50
- The base Airoboros GPT4 models have an increased context length of 4096.
51
 
52
- However this GGML conversion appears to still have the default 2048 context.
53
 
54
- I have experimented with llama.cpp's `-n 4096` parameter to specify a context of 4096 but it so far always results in gibberish output.
55
 
56
- I will investigate this further and upload a correct model if this proves necessary.
57
 
58
- For now, please assume this GGML to have a context of 2048.
59
 
60
- ## THE FILES IN MAIN BRANCH REQUIRES LATEST LLAMA.CPP (May 19th 2023 - commit 2d5db48)!
61
 
62
- llama.cpp recently made another breaking change to its quantisation methods - https://github.com/ggerganov/llama.cpp/pull/1508
 
 
 
 
 
 
63
 
64
- I have quantised the GGML files in this repo with the latest version. Therefore you will require llama.cpp compiled on May 19th or later (commit `2d5db48` or later) to use them.
 
65
 
66
  ## Provided files
67
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
68
  | ---- | ---- | ---- | ---- | ---- | ----- |
69
- | airoboros-13b-gpt4.ggmlv3.q4_0.bin | q4_0 | 4 | 7.32 GB | 9.82 GB | 4-bit. |
70
- | airoboros-13b-gpt4.ggmlv3.q4_1.bin | q4_1 | 4 | 8.14 GB | 10.64 GB | 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. |
71
- | airoboros-13b-gpt4.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.45 GB | 5-bit. Higher accuracy, higher resource usage and slower inference. |
72
- | airoboros-13b-gpt4.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.26 GB | 5-bit. Even higher accuracy, resource usage and slower inference. |
73
- | airoboros-13b-gpt4.ggmlv3.q8_0.bin | q8_0 | 8 | 13.83 GB | 16.33 GB | 8-bit. Almost indistinguishable from float16. Huge resource use and slow. Not recommended for normal use. |
 
 
 
 
 
 
 
 
 
74
 
75
 
76
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
@@ -80,7 +88,7 @@ I have quantised the GGML files in this repo with the latest version. Therefore
80
  I use the following command line; adjust for your tastes and needs:
81
 
82
  ```
83
- ./main -t 10 -ngl 32 -m airoboros-13b-gpt4.ggmlv3.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "USER": Write a story about llamas\nASSISTANT:"
84
  ```
85
  Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
86
 
@@ -112,20 +120,22 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
112
  * Patreon: https://patreon.com/TheBlokeAI
113
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
114
 
115
- **Patreon special mentions**: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman.
 
 
116
 
117
  Thank you to all my generous patrons and donaters!
 
118
  <!-- footer end -->
119
 
120
  # Original model card: Jon Durbin's Airoboros 13B GPT4
121
 
 
122
  ## Overview
123
 
124
  This is a fine-tuned 13b parameter LlaMa model, using completely synthetic training data created gpt4 via https://github.com/jondurbin/airoboros
125
 
126
- The context size has been increased to 4096.
127
-
128
- The dataset used to fine-tune this model is available [here](https://huggingface.co/airoboros-gpt4), with a specific focus on:
129
  - trivia
130
  - math/reasoning (although it still sucks)
131
  - coding
@@ -136,13 +146,11 @@ The dataset used to fine-tune this model is available [here](https://huggingface
136
 
137
  This model was fine-tuned with a fork of FastChat, and therefore uses the standard vicuna template:
138
  ```
139
- USER:
140
- [prompt]
141
-
142
- <\s>
143
- ASSISTANT:
144
  ```
145
 
 
 
146
  The most important bit, to me, is the context obedient question answering support, without extensive prompt engineering.
147
 
148
  ### Usage
@@ -158,7 +166,6 @@ python -m fastchat.serve.cli
158
  --model-path airoboros-13b-gpt4 \
159
  --temperature 0.5 \
160
  --max-new-tokens 4096 \
161
- --context-length 4096 \
162
  --conv-template vicuna_v1.1 \
163
  --no-history
164
  ```
@@ -183,6 +190,23 @@ BEGININSTRUCTION
183
  ENDINSTRUCTION
184
  ```
185
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
  Here's a trivial, but important example to prove the point:
187
  ```
188
  BEGININPUT
@@ -204,6 +228,23 @@ Source:
204
  date: 2021-01-01
205
  url: https://web.site/123
206
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
207
  <details>
208
  <summary>A more elaborate example, with a rewrite of the Michigan Wikipedia article to be fake data.</summary>
209
 
@@ -251,7 +292,7 @@ Once exploited for logging and mining, today the sparsely populated Upper Penins
251
  The Lower Peninsula is a hub for manufacturing, forestry, agriculture, services, and high-tech industry.
252
  ENDINPUT
253
  BEGININSTRUCTION
254
- What is Michigan's capital?
255
  Give me a JSON object with 3 facts from Foolo's article.
256
  What are the states that border Michigan?
257
  Reference?
@@ -346,6 +387,8 @@ pr: 23441
346
 
347
  </details>
348
 
 
 
349
  ### Other stuff
350
 
351
  #### Better coding
@@ -497,12 +540,12 @@ void insert(char* key, char* value) {
497
  perror("malloc failed");
498
  exit(1);
499
  }
500
-
501
  strcpy(parent->key, key);
502
  strcpy(parent->value, value);
503
-
504
  parent->left = parent->right = NULL;
505
-
506
  if (!tree) {
507
  tree = parent;
508
  } else {
@@ -518,7 +561,7 @@ char* search(char* key) {
518
  }
519
  current = current->right;
520
  }
521
-
522
  return NULL;
523
  }
524
 
@@ -533,17 +576,17 @@ void print_tree() {
533
  void* accept_client(void* args) {
534
  int client_socket = *(int*)args;
535
  char buffer[1024];
536
-
537
  while (1) {
538
  ssize_t bytes_received = recv(client_socket, buffer, sizeof(buffer), 0);
539
  if (bytes_received <= 0) {
540
  close(client_socket);
541
  break;
542
  }
543
-
544
  char* input = buffer;
545
  char* input_token;
546
-
547
  while ((input_token = strtok(input, " \n")) != NULL) {
548
  switch (input_token[0]) {
549
  case 'G':
@@ -553,7 +596,7 @@ void* accept_client(void* args) {
553
  }
554
  printf("%s = %s\n", input_token, search(input_token));
555
  break;
556
-
557
  case 'S':
558
  if (!strcmp(input_token, "SET")) {
559
  char* key = strtok(NULL, " \n");
@@ -562,7 +605,7 @@ void* accept_client(void* args) {
562
  perror("Invalid input");
563
  exit(1);
564
  }
565
-
566
  insert(key, value);
567
  printf("%s set %s to %s\n", input_token, key, value);
568
  } else {
@@ -570,16 +613,16 @@ void* accept_client(void* args) {
570
  exit(1);
571
  }
572
  break;
573
-
574
  default:
575
  perror("Invalid input");
576
  exit(1);
577
  }
578
-
579
  input = strtok(NULL, " \n");
580
  }
581
  }
582
-
583
  return NULL;
584
  }
585
 
@@ -589,50 +632,50 @@ int main() {
589
  perror("socket failed");
590
  exit(1);
591
  }
592
-
593
  struct sockaddr_in server_addr;
594
  memset(&server_addr, 0, sizeof(server_addr));
595
  server_addr.sin_family = AF_INET;
596
  server_addr.sin_port = htons(8080);
597
  server_addr.sin_addr.s_addr = INADDR_ANY;
598
-
599
  if (bind(server_socket, (struct sockaddr*)&server_addr, sizeof(server_addr)) < 0) {
600
  perror("bind failed");
601
  exit(1);
602
  }
603
-
604
  if (listen(server_socket, 5) < 0) {
605
  perror("listen failed");
606
  exit(1);
607
  }
608
-
609
  pthread_t accept_thread;
610
  pthread_create(&accept_thread, NULL, accept_client, &server_socket);
611
-
612
  char* client_input;
613
  int client_socket = accept(server_socket, (struct sockaddr*)NULL, NULL);
614
  if (client_socket < 0) {
615
  perror("accept failed");
616
  exit(1);
617
  }
618
-
619
  while (1) {
620
  sleep(1);
621
-
622
  char buffer[1024];
623
  ssize_t bytes_received = recv(client_socket, buffer, sizeof(buffer), 0);
624
  if (bytes_received <= 0) {
625
  close(client_socket);
626
  break;
627
  }
628
-
629
  client_input = buffer;
630
  parse_input(client_input);
631
  }
632
-
633
  close(client_socket);
634
  pthread_join(accept_thread, NULL);
635
-
636
  return 0;
637
  }
638
  ```
 
1
  ---
2
  inference: false
3
  license: other
 
 
4
  ---
5
 
6
  <!-- header start -->
 
31
  ## Repositories available
32
 
33
  * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/airoboros-13b-gpt4-GPTQ)
34
+ * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/airoboros-13b-gpt4-GGML)
35
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/airoboros-13b-gpt4-fp16)
36
 
37
+ <!-- compatibility_ggml start -->
38
+ ## Compatibility
39
 
40
+ ### Original llama.cpp quant methods: `q4_0, q4_1, q5_0, q5_1, q8_0`
41
 
42
+ I have quantized these 'original' quantisation methods using an older version of llama.cpp so that they remain compatible with llama.cpp as of May 19th, commit `2d5db48`.
 
 
 
 
 
 
 
43
 
44
+ They should be compatible with all current UIs and libraries that use llama.cpp, such as those listed at the top of this README.
45
 
46
+ ### New k-quant methods: `q2_K, q3_K_S, q3_K_M, q3_K_L, q4_K_S, q4_K_M, q5_K_S, q6_K`
47
 
48
+ These new quantisation methods are only compatible with llama.cpp as of June 6th, commit `2d43387`.
49
 
50
+ They will NOT be compatible with koboldcpp, text-generation-ui, and other UIs and libraries yet. Support is expected to come over the next few days.
51
 
52
+ ## Explanation of the new k-quant methods
53
 
54
+ The new methods available are:
55
+ * GGML_TYPE_Q2_K - "type-1" 2-bit quantization in super-blocks containing 16 blocks, each block having 16 weight. Block scales and mins are quantized with 4 bits. This ends up effectively using 2.5625 bits per weight (bpw)
56
+ * GGML_TYPE_Q3_K - "type-0" 3-bit quantization in super-blocks containing 16 blocks, each block having 16 weights. Scales are quantized with 6 bits. This end up using 3.4375 bpw.
57
+ * GGML_TYPE_Q4_K - "type-1" 4-bit quantization in super-blocks containing 8 blocks, each block having 32 weights. Scales and mins are quantized with 6 bits. This ends up using 4.5 bpw.
58
+ * GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
59
+ * GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw
60
+ * GGML_TYPE_Q8_K - "type-0" 8-bit quantization. Only used for quantizing intermediate results. The difference to the existing Q8_0 is that the block size is 256. All 2-6 bit dot products are implemented for this quantization type.
61
 
62
+ Refer to the Provided Files table below to see what files use which methods, and how.
63
+ <!-- compatibility_ggml end -->
64
 
65
  ## Provided files
66
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
67
  | ---- | ---- | ---- | ---- | ---- | ----- |
68
+ | airoboros-13b-gpt4.ggmlv3.q4_0.bin | q4_0 | 4 | 7.32 GB | 9.82 GB | Original llama.cpp quant method, 4-bit. |
69
+ | airoboros-13b-gpt4.ggmlv3.q4_1.bin | q4_1 | 4 | 8.14 GB | 10.64 GB | Original llama.cpp quant method, 4-bit. Higher accuracy than q4_0 but not as high as q5_0. However has quicker inference than q5 models. |
70
+ | airoboros-13b-gpt4.ggmlv3.q5_0.bin | q5_0 | 5 | 8.95 GB | 11.45 GB | Original llama.cpp quant method, 5-bit. Higher accuracy, higher resource usage and slower inference. |
71
+ | airoboros-13b-gpt4.ggmlv3.q5_1.bin | q5_1 | 5 | 9.76 GB | 12.26 GB | Original llama.cpp quant method, 5-bit. Even higher accuracy, resource usage and slower inference. |
72
+ | airoboros-13b-gpt4.ggmlv3.q8_0.bin | q8_0 | 8 | 13.83 GB | 16.33 GB | Original llama.cpp quant method, 8-bit. Almost indistinguishable from float16. High resource use and slow. Not recommended for most users. |
73
+ | airoboros-13b.ggmlv3.q2_K.bin | q2_K | 2 | 5.43 GB | 7.93 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.vw and feed_forward.w2 tensors, GGML_TYPE_Q2_K for the other tensors. |
74
+ | airoboros-13b.ggmlv3.q3_K_L.bin | q3_K_L | 3 | 6.87 GB | 9.37 GB | New k-quant method. Uses GGML_TYPE_Q5_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K |
75
+ | airoboros-13b.ggmlv3.q3_K_M.bin | q3_K_M | 3 | 6.25 GB | 8.75 GB | New k-quant method. Uses GGML_TYPE_Q4_K for the attention.wv, attention.wo, and feed_forward.w2 tensors, else GGML_TYPE_Q3_K |
76
+ | airoboros-13b.ggmlv3.q3_K_S.bin | q3_K_S | 3 | 5.59 GB | 8.09 GB | New k-quant method. Uses GGML_TYPE_Q3_K for all tensors |
77
+ | airoboros-13b.ggmlv3.q4_K_M.bin | q4_K_M | 4 | 7.82 GB | 10.32 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q4_K |
78
+ | airoboros-13b.ggmlv3.q4_K_S.bin | q4_K_S | 4 | 7.32 GB | 9.82 GB | New k-quant method. Uses GGML_TYPE_Q4_K for all tensors |
79
+ | airoboros-13b.ggmlv3.q5_K_M.bin | q5_K_M | 5 | 9.21 GB | 11.71 GB | New k-quant method. Uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K |
80
+ | airoboros-13b.ggmlv3.q5_K_S.bin | q5_K_S | 5 | 8.95 GB | 11.45 GB | New k-quant method. Uses GGML_TYPE_Q5_K for all tensors |
81
+ | airoboros-13b.ggmlv3.q6_K.bin | q6_K | 6 | 10.68 GB | 13.18 GB | New k-quant method. Uses GGML_TYPE_Q8_K - 6-bit quantization - for all tensors |
82
 
83
 
84
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
 
88
  I use the following command line; adjust for your tastes and needs:
89
 
90
  ```
91
+ ./main -t 10 -ngl 32 -m airoboros-13b.ggmlv3.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"
92
  ```
93
  Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
94
 
 
120
  * Patreon: https://patreon.com/TheBlokeAI
121
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
122
 
123
+ **Special thanks to**: Luke from CarbonQuill, Aemon Algiz, Dmitriy Samsonov.
124
+
125
+ **Patreon special mentions**: Ajan Kanaga, Kalila, Derek Yates, Sean Connelly, Luke, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, trip7s trip, Jonathan Leane, Talal Aujan, Artur Olbinski, Cory Kujawski, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Johann-Peter Hartmann.
126
 
127
  Thank you to all my generous patrons and donaters!
128
+
129
  <!-- footer end -->
130
 
131
  # Original model card: Jon Durbin's Airoboros 13B GPT4
132
 
133
+
134
  ## Overview
135
 
136
  This is a fine-tuned 13b parameter LlaMa model, using completely synthetic training data created gpt4 via https://github.com/jondurbin/airoboros
137
 
138
+ The dataset used to fine-tune this model is available [here](https://huggingface.co/datasets/jondurbin/airoboros-gpt4), with a specific focus on:
 
 
139
  - trivia
140
  - math/reasoning (although it still sucks)
141
  - coding
 
146
 
147
  This model was fine-tuned with a fork of FastChat, and therefore uses the standard vicuna template:
148
  ```
149
+ USER: [prompt] ASSISTANT:
 
 
 
 
150
  ```
151
 
152
+ *__NOTE: an earlier version claimed context length of 4096 - this did not work! I modified the code to train with with 4096, and several instructions are beyond 2048. I tested a few prompts beyond 2048, and they seem to produce fairly coherent responses with increased context length for a couple hundred tokens beyond 2048, but I did not properly test up to 4096. As it turns out, it would appear without a massive fine-tune of the base model on a larger context window, this won't work. Sorry!__*
153
+
154
  The most important bit, to me, is the context obedient question answering support, without extensive prompt engineering.
155
 
156
  ### Usage
 
166
  --model-path airoboros-13b-gpt4 \
167
  --temperature 0.5 \
168
  --max-new-tokens 4096 \
 
169
  --conv-template vicuna_v1.1 \
170
  --no-history
171
  ```
 
190
  ENDINSTRUCTION
191
  ```
192
 
193
+ It's also helpful to add "Don't make up answers if you don't know." to your instruction block to make sure if the context is completely unrelated it doesn't make something up.
194
+
195
+ *The __only__ prompts that need this closed context formating are closed-context instructions. Normal questions/instructions do not!*
196
+
197
+ I know it's a bit verbose and annoying, but after much trial and error, using these explicit delimiters helps the model understand where to find the responses and how to associate specific sources with it.
198
+ - `BEGININPUT` - denotes a new input block
199
+ - `BEGINCONTEXT` - denotes the block of context (metadata key/value pairs) to associate with the current input block
200
+ - `ENDCONTEXT` - denotes the end of the metadata block for the current input
201
+ - [text] - Insert whatever text you want for the input block, as many paragraphs as can fit in the context.
202
+ - `ENDINPUT` - denotes the end of the current input block
203
+ - [repeat as many input blocks in this format as you want]
204
+ - `BEGININSTRUCTION` - denotes the start of the list (or one) instruction(s) to respond to for all of the input blocks above.
205
+ - [instruction(s)]
206
+ - `ENDINSTRUCTION` - denotes the end of instruction set
207
+
208
+ It sometimes works without `ENDINSTRUCTION`, but by explicitly including that in the prompt, the model better understands that all of the instructions in the block should be responded to.
209
+
210
  Here's a trivial, but important example to prove the point:
211
  ```
212
  BEGININPUT
 
228
  date: 2021-01-01
229
  url: https://web.site/123
230
  ```
231
+
232
+ The prompt itself should be wrapped in the vicuna1.1 template if you aren't using fastchat with the conv-template vicuna_v1.1 as described:
233
+
234
+ ```
235
+ USER: BEGININPUT
236
+ BEGINCONTEXT
237
+ date: 2021-01-01
238
+ url: https://web.site/123
239
+ ENDCONTEXT
240
+ In a shocking turn of events, blueberries are now green, but will be sticking with the same name.
241
+ ENDINPUT
242
+ BEGININSTRUCTION
243
+ What color are bluberries? Source?
244
+ ENDINSTRUCTION
245
+ ASSISTANT:
246
+ ```
247
+
248
  <details>
249
  <summary>A more elaborate example, with a rewrite of the Michigan Wikipedia article to be fake data.</summary>
250
 
 
292
  The Lower Peninsula is a hub for manufacturing, forestry, agriculture, services, and high-tech industry.
293
  ENDINPUT
294
  BEGININSTRUCTION
295
+ What is Michigan's capital?
296
  Give me a JSON object with 3 facts from Foolo's article.
297
  What are the states that border Michigan?
298
  Reference?
 
387
 
388
  </details>
389
 
390
+ NOTE: Thanks /u/tareq_al_muntasir for testing and finding an issue with many questions and answer pairs in the context. If you ask a question of a document with question answer pairs, it may continue generating beyond your actual question. You can "fix" it by replacing question marks with periods in the input texts. Or, you might be able to add a preamble to the prompt, like "Be sure to only respond to the instructions in the BEGININSTRUCTION block.
391
+
392
  ### Other stuff
393
 
394
  #### Better coding
 
540
  perror("malloc failed");
541
  exit(1);
542
  }
543
+
544
  strcpy(parent->key, key);
545
  strcpy(parent->value, value);
546
+
547
  parent->left = parent->right = NULL;
548
+
549
  if (!tree) {
550
  tree = parent;
551
  } else {
 
561
  }
562
  current = current->right;
563
  }
564
+
565
  return NULL;
566
  }
567
 
 
576
  void* accept_client(void* args) {
577
  int client_socket = *(int*)args;
578
  char buffer[1024];
579
+
580
  while (1) {
581
  ssize_t bytes_received = recv(client_socket, buffer, sizeof(buffer), 0);
582
  if (bytes_received <= 0) {
583
  close(client_socket);
584
  break;
585
  }
586
+
587
  char* input = buffer;
588
  char* input_token;
589
+
590
  while ((input_token = strtok(input, " \n")) != NULL) {
591
  switch (input_token[0]) {
592
  case 'G':
 
596
  }
597
  printf("%s = %s\n", input_token, search(input_token));
598
  break;
599
+
600
  case 'S':
601
  if (!strcmp(input_token, "SET")) {
602
  char* key = strtok(NULL, " \n");
 
605
  perror("Invalid input");
606
  exit(1);
607
  }
608
+
609
  insert(key, value);
610
  printf("%s set %s to %s\n", input_token, key, value);
611
  } else {
 
613
  exit(1);
614
  }
615
  break;
616
+
617
  default:
618
  perror("Invalid input");
619
  exit(1);
620
  }
621
+
622
  input = strtok(NULL, " \n");
623
  }
624
  }
625
+
626
  return NULL;
627
  }
628
 
 
632
  perror("socket failed");
633
  exit(1);
634
  }
635
+
636
  struct sockaddr_in server_addr;
637
  memset(&server_addr, 0, sizeof(server_addr));
638
  server_addr.sin_family = AF_INET;
639
  server_addr.sin_port = htons(8080);
640
  server_addr.sin_addr.s_addr = INADDR_ANY;
641
+
642
  if (bind(server_socket, (struct sockaddr*)&server_addr, sizeof(server_addr)) < 0) {
643
  perror("bind failed");
644
  exit(1);
645
  }
646
+
647
  if (listen(server_socket, 5) < 0) {
648
  perror("listen failed");
649
  exit(1);
650
  }
651
+
652
  pthread_t accept_thread;
653
  pthread_create(&accept_thread, NULL, accept_client, &server_socket);
654
+
655
  char* client_input;
656
  int client_socket = accept(server_socket, (struct sockaddr*)NULL, NULL);
657
  if (client_socket < 0) {
658
  perror("accept failed");
659
  exit(1);
660
  }
661
+
662
  while (1) {
663
  sleep(1);
664
+
665
  char buffer[1024];
666
  ssize_t bytes_received = recv(client_socket, buffer, sizeof(buffer), 0);
667
  if (bytes_received <= 0) {
668
  close(client_socket);
669
  break;
670
  }
671
+
672
  client_input = buffer;
673
  parse_input(client_input);
674
  }
675
+
676
  close(client_socket);
677
  pthread_join(accept_thread, NULL);
678
+
679
  return 0;
680
  }
681
  ```