deep-todo / README.md
silky's picture
Update README.md
4cbce34
# deep-todo
Wondering what to do? Not anymore!
Generate arbitrary todo's.
Source: <https://colab.research.google.com/drive/1PlKLrGHaCuvWCKNC4fmQEMElF-iRec9f?usp=sharing>
The todo's come from a random selection of (public) repositories I had on my computer.
### Sample
A bunch of todo's:
```
----------------------------------------------------------------------------------------------------
0: TODO: should we check the other edges?/
1: TODO: add more information here.
2: TODO: We could also add more general functions in this case to avoid/
3: TODO: It seems strange to have the same constructor when the base set of/
4: TODO: This implementation should be simplified, as it's too complex to handle the/
5: TODO: we should be able to relax the intrinsic if not
6: TODO: Make sure this doesn't go through the next generation of plugins. It would be better if this was
7: TODO: There is always a small number of errors when we have this type/
8: TODO: Add support for 't' values (not 't') for all the constant types/
9: TODO: Check that we use loglef_cxx in the loop*
10: TODO: Support double or double values./
11: TODO: Add tests that verify that this function does not work for all targets/
12: TODO: we'd expect the result to be identical to the same value in terms of
13: TODO: We are not using a new type for 'w' as it does not denote 'y' yet, so we could/
14: TODO: if we had to find a way to extract the source file directly, we would/
15: TODO: this should fold into a flat array that would be/
16: TODO: Check if we can make it work with the correct address./
17: TODO: support v2i with V2R4+
18: TODO: Can a fast-math-flags check be generalized to all types of data? */
19: TODO: Add support for other type-specific VOPs.
```
Generated by:
```
tf.random.set_seed(0)
sample_outputs = model.generate(
input_ids,
do_sample=True,
max_length=40,
top_k=50,
top_p=0.95,
num_return_sequences=20
)
print("Output:\\
" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
m = tokenizer.decode(sample_output, skip_special_tokens=True)
m = m.split("TODO")[1].strip()
print("{}: TODO{}".format(i, m))
```
## TODO
- [ ] Fixup the data; it seems to contain multiple todo's per line
- [ ] Preprocess the data in a better way
- [ ] Download github and train it on everything