# deep-todo Wondering what to do? Not anymore! Generate arbitrary todo's. Source: The todo's come from a random selection of (public) repositories I had on my computer. ### Sample A bunch of todo's: ``` ---------------------------------------------------------------------------------------------------- 0: TODO: should we check the other edges?/ 1: TODO: add more information here. 2: TODO: We could also add more general functions in this case to avoid/ 3: TODO: It seems strange to have the same constructor when the base set of/ 4: TODO: This implementation should be simplified, as it's too complex to handle the/ 5: TODO: we should be able to relax the intrinsic if not 6: TODO: Make sure this doesn't go through the next generation of plugins. It would be better if this was 7: TODO: There is always a small number of errors when we have this type/ 8: TODO: Add support for 't' values (not 't') for all the constant types/ 9: TODO: Check that we use loglef_cxx in the loop* 10: TODO: Support double or double values./ 11: TODO: Add tests that verify that this function does not work for all targets/ 12: TODO: we'd expect the result to be identical to the same value in terms of 13: TODO: We are not using a new type for 'w' as it does not denote 'y' yet, so we could/ 14: TODO: if we had to find a way to extract the source file directly, we would/ 15: TODO: this should fold into a flat array that would be/ 16: TODO: Check if we can make it work with the correct address./ 17: TODO: support v2i with V2R4+ 18: TODO: Can a fast-math-flags check be generalized to all types of data? */ 19: TODO: Add support for other type-specific VOPs. ``` Generated by: ``` tf.random.set_seed(0) sample_outputs = model.generate( input_ids, do_sample=True, max_length=40, top_k=50, top_p=0.95, num_return_sequences=20 ) print("Output:\\ " + 100 * '-') for i, sample_output in enumerate(sample_outputs): m = tokenizer.decode(sample_output, skip_special_tokens=True) m = m.split("TODO")[1].strip() print("{}: TODO{}".format(i, m)) ``` ## TODO - [ ] Fixup the data; it seems to contain multiple todo's per line - [ ] Preprocess the data in a better way - [ ] Download github and train it on everything