# deep-todo

Wondering what to do? Not anymore!

Generate arbitrary todo's.

Source: <https://colab.research.google.com/drive/1PlKLrGHaCuvWCKNC4fmQEMElF-iRec9f?usp=sharing>

The todo's come from a random selection of (public) repositories I had on my computer.


### Sample

A bunch of todo's:

```
----------------------------------------------------------------------------------------------------
0: TODO: should we check the other edges?/
1: TODO: add more information here.
2: TODO: We could also add more general functions in this case to avoid/
3: TODO: It seems strange to have the same constructor when the base set of/
4: TODO: This implementation should be simplified, as it's too complex to handle the/
5: TODO: we should be able to relax the intrinsic if not
6: TODO: Make sure this doesn't go through the next generation of plugins.  It would be better if this was
7: TODO: There is always a small number of errors when we have this type/
8: TODO: Add support for 't' values (not 't') for all the constant types/
9: TODO: Check that we use loglef_cxx in the loop*
10: TODO: Support double or double values./
11: TODO: Add tests that verify that this function does not work for all targets/
12: TODO: we'd expect the result to be identical to the same value in terms of
13: TODO: We are not using a new type for 'w' as it does not denote 'y' yet, so we could/
14: TODO: if we had to find a way to extract the source file directly, we would/
15: TODO: this should fold into a flat array that would be/
16: TODO: Check if we can make it work with the correct address./
17: TODO: support v2i with V2R4+
18: TODO: Can a fast-math-flags check be generalized to all types of data? */
19: TODO: Add support for other type-specific VOPs.
```

Generated by:

```
tf.random.set_seed(0)

sample_outputs = model.generate(
    input_ids,
    do_sample=True, 
    max_length=40, 
    top_k=50, 
    top_p=0.95, 
    num_return_sequences=20
)

print("Output:\\
" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
  m = tokenizer.decode(sample_output, skip_special_tokens=True)
  m = m.split("TODO")[1].strip()
  print("{}: TODO{}".format(i, m))
```


## TODO

- [ ] Fixup the data; it seems to contain multiple todo's per line
- [ ] Preprocess the data in a better way
- [ ] Download github and train it on everything