Update README.md

Browse files

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -140,7 +140,7 @@ My initial attempt focused on training a sentence transformer based on SELFIES,
 The next challenges were how to properly make molecule pairs that is diverse yet informative, and how to label them. After tackling those, I trained the model on a dataset built from natural compounds taken from [COCONUTDB](https://coconut.naturalproducts.net/). After some initial training, I pushed [the model to Hugging Face](https://huggingface.co/gbyuvd/ChemEmbed-v01) to get some feedback. Gladly, [Tom Aarsen](https://huggingface.co/tomaarsen) provided [valuable suggestions](https://huggingface.co/gbyuvd/ChemEmbed-v01/discussions/1), including training a custom tokenizer, exploring [Matryoshka embeddings](https://huggingface.co/blog/matryoshka), and considering training from scratch. The attempt to implement Tom's suggestions, specifically in training from scratch is the main goal of this project as well as a first experience for me.
-Lastly before going into the details, it's important to note that this is the result of a hands-on learning project, and as such - beside my insufficient knowledge -  it may not meet rigorous scientific standards. Like any learning journey, it's messy and I myself constrained by financial, computational, and time limitations. I've had to make compromises, such as conducting incomplete experiments and chunking datasets. However, I'm eager to receive any feedback, so that I can improve both myself and future models/projects. A more detailed article discussing this project in details is coming soon.
 ## Training Details

 The next challenges were how to properly make molecule pairs that is diverse yet informative, and how to label them. After tackling those, I trained the model on a dataset built from natural compounds taken from [COCONUTDB](https://coconut.naturalproducts.net/). After some initial training, I pushed [the model to Hugging Face](https://huggingface.co/gbyuvd/ChemEmbed-v01) to get some feedback. Gladly, [Tom Aarsen](https://huggingface.co/tomaarsen) provided [valuable suggestions](https://huggingface.co/gbyuvd/ChemEmbed-v01/discussions/1), including training a custom tokenizer, exploring [Matryoshka embeddings](https://huggingface.co/blog/matryoshka), and considering training from scratch. The attempt to implement Tom's suggestions, specifically in training from scratch is the main goal of this project as well as a first experience for me.
+Lastly before going into the details, it's important to note that this is the result of a hands-on learning project, and as such - beside my insufficient knowledge -  it may not meet rigorous scientific standards. Like any learning journey, it's messy and I myself constrained by financial, computational, and time limitations. I've had to make compromises, such as conducting incomplete experiments and chunking datasets. However, I am more than happy to receive any feedback, so that I can improve both myself and future models/projects. A more detailed article discussing this project in details is coming soon.
 ## Training Details