Geneformer / geneformer /tokenizer.py

Commit History

edit docs formatting
ef094b2

ctheodoris commited on

update tokenizer to defaults for 95M models for special token and input size
da8cf3d
verified

ctheodoris commited on

precommit formatting
f07bfd7

ctheodoris commited on

move dicts to init
ea428cb

ctheodoris commited on

update tokenizer to include eos token
ead0550

Christina Theodoris commited on

fix cell state gene embeddings bug (#345)
c0e7b19
verified

ctheodoris commited on

patch datasets save_to_disk
75c67a1

Christina Theodoris commited on

correct typo
5a43832
verified

ctheodoris commited on

Update readthedocs for classifier
f75f5ac

Christina Theodoris commited on

Get the gene keys and gene list keys from the token dictionary instead of medians (#304)
b294421
verified

ctheodoris hchen725 commited on

Add classifier module and examples
9e9cca9

Christina Theodoris commited on

Add option for variable input_size and to add CLS/SEP Tokens (#299)
aa25cd2
verified

ctheodoris hchen725 commited on

edit docstring format to highlight options
e3330a6

Christina Theodoris commited on

change doc formatting
17f036a

Christina Theodoris commited on

add sphinx docs
2a0dcbe

Christina Theodoris commited on

Add option for modified batch size for loom tokenizer
0960cf6

Christina Theodoris commited on

Add option for modifying chunk size for anndata tokenizer
fd93ebf

Christina Theodoris commited on

Add error for no files found and suppress loompy import warning
abdf980

Christina Theodoris commited on

Update tokenizer to allow tokenization without custom cell attributes
57b9778

Christina Theodoris commited on

Modify tokenizer to allow renaming attr names btwn loom and .dataset
e78c44d

Christina Theodoris commited on

Add further explanation regarding input file format for transcriptome tokenizer
c34ead6

Christina Theodoris commited on

Add further explanation to tokenizer example script and updated tokenizer to match loompy raised error
78dd83b

Christina Theodoris commited on

Fix bug with metadata when processing multiple .loom files (#3)
044d737

ctheodoris davidjwen commited on

Add data collator for cell classification and example for cell classification
088ea6e

Christina Theodoris commited on

Add Geneformer tokenizer and updated model card
5426788

Christina Theodoris commited on